[
https://issues.apache.org/jira/browse/FLUME-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106394#comment-14106394
]
Ashish Paliwal commented on FLUME-2445:
---------------------------------------
Pls ask questions on User ML http://flume.apache.org/mailinglists.html
> Duplicate files in s3 (both temp and final file)
> ------------------------------------------------
>
> Key: FLUME-2445
> URL: https://issues.apache.org/jira/browse/FLUME-2445
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.5.0
> Reporter: Bijith Kumar
>
> Noticed that both temp and final file are created in S3 bucket by HDFS sink
> as shown below
> -rw-rw-rw- 1 9558423 2014-08-18 18:01
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz
> -rw-rw-rw- 1 9558423 2014-08-18 18:01
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> I could not find any errors in agent log. However, the agent tried to close
> and rename the temp file again when I tried to restart the agent next day.
> Even after second try, both file exists.
> Please find the logs below. File uploaded on Aug 18 and agent restarted on
> 19th
> $ grep actions-i-e9b26de6.1408381201580 logs/flume.log
> 18 Aug 2014 17:00:01,591 INFO
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.BucketWriter.open:261) - Creating
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> 18 Aug 2014 17:00:02,150 INFO [hdfs-s3sink-actions-call-runner-1]
> (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.<init>:182)
> - OutputStream for key
> 'flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp'
> writing to tempfile
> '/var/lib/hadoop-hdfs/cache/ec2-user/s3/output-1521416101446161225.tmp'
> 18 Aug 2014 18:01:02,535 INFO [hdfs-s3sink-actions-roll-timer-0]
> (org.apache.flume.sink.hdfs.BucketWriter$5.call:469) - Closing idle
> bucketWriter
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> at 1408384862535
> 18 Aug 2014 18:01:02,535 INFO [hdfs-s3sink-actions-roll-timer-0]
> (org.apache.flume.sink.hdfs.BucketWriter.close:409) - Closing
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> 18 Aug 2014 18:01:02,535 INFO [hdfs-s3sink-actions-call-runner-7]
> (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close:217)
> - OutputStream for key
> 'flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp'
> closed. Now beginning upload
> 18 Aug 2014 18:01:08,043 INFO [hdfs-s3sink-actions-call-runner-7]
> (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close:229)
> - OutputStream for key
> 'flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp'
> upload complete
> 18 Aug 2014 18:01:08,165 INFO [hdfs-s3sink-actions-call-runner-8]
> (org.apache.flume.sink.hdfs.BucketWriter$8.call:669) - Renaming
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> to
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz
> 19 Aug 2014 19:55:37,635 INFO [conf-file-poller-0]
> (org.apache.flume.sink.hdfs.BucketWriter.close:409) - Closing
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> 19 Aug 2014 19:55:37,635 INFO [conf-file-poller-0]
> (org.apache.flume.sink.hdfs.BucketWriter.close:428) - HDFSWriter is already
> closed:
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> 19 Aug 2014 19:55:38,064 INFO [hdfs-s3sink-actions-call-runner-1]
> (org.apache.flume.sink.hdfs.BucketWriter$8.call:669) - Renaming
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> to
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz
--
This message was sent by Atlassian JIRA
(v6.2#6252)