[ 
https://issues.apache.org/jira/browse/FLUME-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107128#comment-14107128
 ] 

Hari Shreedharan commented on FLUME-2445:
-----------------------------------------

Even though the HDFS sink can write to S3 (since it uses the HDFS client API) - 
it really is not tested and verified. The S3 connectors for HDFS may or may not 
work correctly with Flume. It looks like renames don't actually work as 
expected for some reason with S3.

> Duplicate files in s3 (both temp and final file)
> ------------------------------------------------
>
>                 Key: FLUME-2445
>                 URL: https://issues.apache.org/jira/browse/FLUME-2445
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.5.0
>            Reporter: Bijith Kumar
>
> Noticed that both temp and final file are created in S3 bucket by HDFS sink 
> as shown below
> -rw-rw-rw-   1    9558423 2014-08-18 18:01 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz
> -rw-rw-rw-   1    9558423 2014-08-18 18:01 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> I could not find any errors in agent log. However, the agent tried to close 
> and rename the temp file again when I tried to restart the agent next day. 
> Even after second try, both file exists. 
> Please find the logs below. File uploaded on Aug 18 and agent restarted on 
> 19th
> $ grep actions-i-e9b26de6.1408381201580 logs/flume.log 
> 18 Aug 2014 17:00:01,591 INFO  
> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
> (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> 18 Aug 2014 17:00:02,150 INFO  [hdfs-s3sink-actions-call-runner-1] 
> (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.<init>:182)
>   - OutputStream for key 
> 'flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp'
>  writing to tempfile 
> '/var/lib/hadoop-hdfs/cache/ec2-user/s3/output-1521416101446161225.tmp'
> 18 Aug 2014 18:01:02,535 INFO  [hdfs-s3sink-actions-roll-timer-0] 
> (org.apache.flume.sink.hdfs.BucketWriter$5.call:469)  - Closing idle 
> bucketWriter 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
>  at 1408384862535
> 18 Aug 2014 18:01:02,535 INFO  [hdfs-s3sink-actions-roll-timer-0] 
> (org.apache.flume.sink.hdfs.BucketWriter.close:409)  - Closing 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> 18 Aug 2014 18:01:02,535 INFO  [hdfs-s3sink-actions-call-runner-7] 
> (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close:217)
>   - OutputStream for key 
> 'flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp'
>  closed. Now beginning upload
> 18 Aug 2014 18:01:08,043 INFO  [hdfs-s3sink-actions-call-runner-7] 
> (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close:229)
>   - OutputStream for key 
> 'flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp'
>  upload complete
> 18 Aug 2014 18:01:08,165 INFO  [hdfs-s3sink-actions-call-runner-8] 
> (org.apache.flume.sink.hdfs.BucketWriter$8.call:669)  - Renaming 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
>  to 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz
> 19 Aug 2014 19:55:37,635 INFO  [conf-file-poller-0] 
> (org.apache.flume.sink.hdfs.BucketWriter.close:409)  - Closing 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> 19 Aug 2014 19:55:37,635 INFO  [conf-file-poller-0] 
> (org.apache.flume.sink.hdfs.BucketWriter.close:428)  - HDFSWriter is already 
> closed: 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
> 19 Aug 2014 19:55:38,064 INFO  [hdfs-s3sink-actions-call-runner-1] 
> (org.apache.flume.sink.hdfs.BucketWriter$8.call:669)  - Renaming 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
>  to 
> s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to