[ 
https://issues.apache.org/jira/browse/FLUME-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610761#comment-16610761
 ] 

Da Kuang commented on FLUME-2445:
---------------------------------

What is the status of this ticket? Any plan to get it released into next 
release?

We had a similar issue on version 1.8.0 with kafka source, file channel and 
hdfs sink (S3). Rather than having both .tmp and the actual log file, we had an 
empty log file (.tmp was likely being deleted). The error happened like once or 
twice everyday. So we have lost data when it happened.  
{code:java}
11 Sep 2018 06:53:45,874 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.open:251) - Creating 
s3n://my-s3-bucket/flume/20180911/06/host1.n9.jungle.tech.1536645600639.gz.tmp
11 Sep 2018 06:53:46,056 INFO [hdfs-s3-call-runner-3] 
(org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.<init>:246)
 - OutputStream for key 
'flume/20180911/06/host1.n9.jungle.tech.1536645600639.gz.tmp' writing to 
tempfile '/tmp/hadoop-root/s3/output-3567672823738089795.tmp'
11 Sep 2018 06:53:46,057 INFO [hdfs-s3-call-runner-3] 
(org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetNumCurrentReplicas:189)
 - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 
not available; 
fsOut=org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream; 
err=java.lang.NoSuchMethodException: 
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.getNumCurrentReplicas()
11 Sep 2018 06:53:46,057 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.getRefIsClosed:202) - isFileClosed() 
is not available in the version of the distributed filesystem being used. Flume 
will not attempt to re-close files if the close fails on the first attempt
11 Sep 2018 06:53:46,058 WARN [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.append:582) - Caught IOException 
writing to HDFSWriter (write beyond end of stream). Closing file 
(s3n://my-s3-bucket/flume/20180911/06/host1.n9.jungle.tech.1536645600639.gz.tmp)
 and rethrowing exception.
11 Sep 2018 06:53:46,058 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.close:393) - Closing 
s3n://my-s3-bucket/flume/20180911/06/host1.n9.jungle.tech.1536645600639.gz.tmp
11 Sep 2018 06:53:46,058 INFO [hdfs-s3-call-runner-0] 
(org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close:280)
 - OutputStream for key 
'flume/20180911/06/host1.n9.jungle.tech.1536645600639.gz.tmp' closed. Now 
beginning upload
11 Sep 2018 06:53:46,191 INFO [hdfs-s3-call-runner-0] 
(org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close:292)
 - OutputStream for key 
'flume/20180911/06/host1.n9.jungle.tech.1536645600639.gz.tmp' upload complete
11 Sep 2018 06:53:46,261 INFO [hdfs-s3-call-runner-2] 
(org.apache.flume.sink.hdfs.BucketWriter$8.call:655) - Renaming 
s3n://my-s3-bucket/flume/20180911/06/host1.n9.jungle.tech.1536645600639.gz.tmp 
to s3n://my-s3-bucket/flume/20180911/06/host1.n9.jungle.tech.1536645600639.gz
11 Sep 2018 06:53:46,856 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.HDFSEventSink$1.run:382) - Writer callback called.
11 Sep 2018 06:53:46,864 WARN [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.HDFSEventSink.process:443) - HDFS IO error
java.io.IOException: write beyond end of stream
at 
org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:66)
at java.io.OutputStream.write(OutputStream.java:75)
at 
org.apache.flume.serialization.BodyTextEventSerializer.write(BodyTextEventSerializer.java:70)
at 
org.apache.flume.sink.hdfs.HDFSCompressedDataStream.append(HDFSCompressedDataStream.java:125)
at org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:577)
at org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:574)
at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701)
at 
org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

> Duplicate files in s3 (both temp and final file)
> ------------------------------------------------
>
>                 Key: FLUME-2445
>                 URL: https://issues.apache.org/jira/browse/FLUME-2445
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: 1.5.0
>            Reporter: Bijith Kumar
>            Priority: Major
>
> Noticed that both temp and final file are created in S3 bucket by HDFS sink 
> as shown below
> -rw-rw-rw-   1    9558423 2014-08-18 18:01 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz
> -rw-rw-rw-   1    9558423 2014-08-18 18:01 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> I could not find any errors in agent log. However, the agent tried to close 
> and rename the temp file again when I tried to restart the agent next day. 
> Even after second try, both file exists. 
> Please find the logs below. File uploaded on Aug 18 and agent restarted on 
> 19th
> $ grep actions-i-e9b26de6.1408381201580 logs/flume.log 
> 18 Aug 2014 17:00:01,591 INFO  
> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
> (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> 18 Aug 2014 17:00:02,150 INFO  [hdfs-s3sink-actions-call-runner-1] 
> (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.<init>:182)
>   - OutputStream for key 
> 'flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp'
>  writing to tempfile 
> '/var/lib/hadoop-hdfs/cache/ec2-user/s3/output-1521416101446161225.tmp'
> 18 Aug 2014 18:01:02,535 INFO  [hdfs-s3sink-actions-roll-timer-0] 
> (org.apache.flume.sink.hdfs.BucketWriter$5.call:469)  - Closing idle 
> bucketWriter 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
>  at 1408384862535
> 18 Aug 2014 18:01:02,535 INFO  [hdfs-s3sink-actions-roll-timer-0] 
> (org.apache.flume.sink.hdfs.BucketWriter.close:409)  - Closing 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> 18 Aug 2014 18:01:02,535 INFO  [hdfs-s3sink-actions-call-runner-7] 
> (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close:217)
>   - OutputStream for key 
> 'flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp'
>  closed. Now beginning upload
> 18 Aug 2014 18:01:08,043 INFO  [hdfs-s3sink-actions-call-runner-7] 
> (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close:229)
>   - OutputStream for key 
> 'flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp'
>  upload complete
> 18 Aug 2014 18:01:08,165 INFO  [hdfs-s3sink-actions-call-runner-8] 
> (org.apache.flume.sink.hdfs.BucketWriter$8.call:669)  - Renaming 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
>  to 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz
> 19 Aug 2014 19:55:37,635 INFO  [conf-file-poller-0] 
> (org.apache.flume.sink.hdfs.BucketWriter.close:409)  - Closing 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> 19 Aug 2014 19:55:37,635 INFO  [conf-file-poller-0] 
> (org.apache.flume.sink.hdfs.BucketWriter.close:428)  - HDFSWriter is already 
> closed: 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> 19 Aug 2014 19:55:38,064 INFO  [hdfs-s3sink-actions-call-runner-1] 
> (org.apache.flume.sink.hdfs.BucketWriter$8.call:669)  - Renaming 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
>  to 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to