Tomas Zulberti created FLUME-3369:
-------------------------------------

             Summary: Corrupt S3 File
                 Key: FLUME-3369
                 URL: https://issues.apache.org/jira/browse/FLUME-3369
             Project: Flume
          Issue Type: Bug
    Affects Versions: 1.9.0
            Reporter: Tomas Zulberti


We are using Flume to read from Kinesis, and upload the files to S3. The issue 
comes that the generated Gzip file is corrupt:

- it is an empty file
- it is a file that isn't a valid Gz File.

I checked FLUME-2967, and we are already using native libraries. The stack 
trace I have is as follows:

{code}
21 May 2020 01:09:27,342 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.open:246)  - Creating 
s3a://mycompany/foobar/year=2020/month=05/day=21/hour=01/172_17_5_220_bids4.1590023192733.gz

21 May 2020 01:09:27,393 INFO  [hdfs-bids4-call-runner-19] 
(org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetNumCurrentReplicas:190)
  - FileSystem's output stream doesn't support getNumCurrentReplicas; 
--HDFS-826 not available; fsOut=org.apache.hadoop.fs.s3a.S3ABlockOutputStream; 
err=java.lang.NoSuchMethodException: 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream.getNumCurrentReplicas()

21 May 2020 01:09:27,396 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.getRefIsClosed:197)  - isFileClosed() 
is not available in the version of the distributed filesystem being used. Flume 
will not attempt to re-close files if the close fails on the first attempt

21 May 2020 01:09:27,614 WARN  [hdfs-bids4-roll-timer-0] 
(org.apache.flume.sink.hdfs.BucketWriter$CloseHandler.close:348)  - Closing 
file: 
s3a://mycompany/foobar/year=2020/month=05/day=21/hour=01/172_17_5_220_foobar.1590022801143.gz
 failed. Will retry again in 180 seconds.

java.io.IOException: Filesystem {bucket=dw.jampp.com, 
key='foobar/year=2020/month=05/day=21/hour=01/172_17_5_220_foobar.1590022801143.gz'}
 closed
        at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream.checkOpen(S3ABlockOutputStream.java:224)
        at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream.write(S3ABlockOutputStream.java:270)
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at 
org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:83)
        at 
org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
        at 
org.apache.flume.sink.hdfs.HDFSCompressedDataStream.close(HDFSCompressedDataStream.java:149)
        at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:319)
        at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:316)
        at 
org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:727)
        at 
org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
        at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:724)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

21 May 2020 01:09:27,656 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.append:613)  - Caught IOException 
writing to HDFSWriter (write beyond end of stream). Closing file 
(s3a://mycompany/foobar/year=2020/month=05/day=21/hour=01/172_17_5_220_foobar.1590023192733.gz)
 and rethrowing exception.

21 May 2020 01:09:27,658 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.HDFSEventSink$1.run:393)  - Writer callback called.

21 May 2020 01:09:27,658 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.doClose:438)  - Closing 
s3a://mycompany/foobar/year=2020/month=05/day=21/hour=01/172_17_5_220_foobar.1590023192733.gz
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to