[
https://issues.apache.org/jira/browse/FLUME-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064184#comment-14064184
]
Dennis Waldron commented on FLUME-1228:
---------------------------------------
Hi Rob, I'm not discussing this topic purely because nobody has asked for
additional help.
I see that you are using CDH 4.4 which maps to Hadoop 2.0.0. As I don't use a
full blown Hadoop installation I've made some effort to simulate your problem
on an AWS EC2 node using the library versions defined by the CDH project
(http://maven-repository.com/artifact/com.cloudera.cdh/cdh-root/4.4.0/pom)
Using an unmodified Flume 1.4 installation with the following additional
libraries:
{noformat}
-rw-rw-r-- 1 ec2-user ec2-user 1756571 Jul 16 21:34
hadoop-core-2.0.0-mr1-cdh4.4.0.jar
-rw-rw-r-- 1 ec2-user ec2-user 2284942 Jul 16 21:34
hadoop-common-2.0.0-cdh4.4.0.jar
-rw-rw-r-- 1 ec2-user ec2-user 46855 Jul 16 21:34
hadoop-auth-2.0.0-cdh4.4.0.jar
-rw-rw-r-- 1 ec2-user ec2-user 305001 Jul 16 21:34 commons-httpclient-3.1.jar
-rw-rw-r-- 1 ec2-user ec2-user 298829 Jul 16 21:34
commons-configuration-1.6.jar
-rw-rw-r-- 1 ec2-user ec2-user 321806 Jul 16 21:34 jets3t-0.6.1.jar
{noformat}
I was able to reproduce the 404 problem:
{noformat}
2014-07-16 21:36:19,158 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO -
org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:219)] Creating
s3n://S3_BUCKET/flume-debug/FlumeData.1405546579015.tmp
2014-07-16 21:36:20,291 (hdfs-hdfs-1-call-runner-0) [WARN -
org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:393)]
Response '/flume-debug%2FFlumeData.1405546579015.tmp' - Unexpected response
code 404, expected 200
2014-07-16 21:36:20,306 (hdfs-hdfs-1-call-runner-0) [WARN -
org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:393)]
Response '/flume-debug%2FFlumeData.1405546579015.tmp_%24folder%24' -
Unexpected response code 404, expected 200
2014-07-16 21:36:20,336 (hdfs-hdfs-1-call-runner-0) [INFO -
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.<init>(NativeS3FileSystem.java:182)]
OutputStream for key 'flume-
2014-07-16 21:36:20,374 (hdfs-hdfs-1-call-runner-3) [INFO -
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close(NativeS3FileSystem.java:217)]
OutputStream for key 'flume-debug/FlumeData.1405546579015.tmp' closed. Now
beginning upload
2014-07-16 21:36:20,417 (hdfs-hdfs-1-call-runner-3) [INFO -
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close(NativeS3FileSystem.java:229)]
OutputStream for key 'flume-debug/FlumeData.1405546579015.tmp' upload complete
2014-07-16 21:36:20,435 (hdfs-hdfs-1-call-runner-4) [INFO -
org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:487)] Renaming
s3n://S3_BUCKET/flume-debug/FlumeData.1405546579015.tmp to
s3n://S3_BUCKET/flume-debug/FlumeData.1405546579015
{noformat}
Now by changing the jets3t library from version 0.6.1 to 0.7.1 (as per my
previous recommendation):
{noformat}
-rw-rw-r-- 1 ec2-user ec2-user 377780 Jul 16 21:40 jets3t-0.7.1.jar
{noformat}
The problem is fixed:
{noformat}
2014-07-16 21:39:36,724 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO -
org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:219)] Creating
s3n://S3_BUCKET/flume-debug/FlumeData.1405546776595.tmp
2014-07-16 21:39:38,071 (hdfs-hdfs-1-call-runner-0) [INFO -
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.<init>(NativeS3FileSystem.java:182)]
OutputStream for key 'flume-debug/FlumeData.1405546776595.tmp' writing to
tempfile '/tmp/hadoop-ec2-user/s3/output-3361552510304141410.tmp'
2014-07-16 21:39:38,110 (hdfs-hdfs-1-call-runner-3) [INFO -
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close(NativeS3FileSystem.java:217)]
OutputStream for key 'flume-debug/FlumeData.1405546776595.tmp' closed. Now
beginning upload
2014-07-16 21:39:38,159 (hdfs-hdfs-1-call-runner-3) [INFO -
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close(NativeS3FileSystem.java:229)]
OutputStream for key 'flume-debug/FlumeData.1405546776595.tmp' upload complete
2014-07-16 21:39:38,174 (hdfs-hdfs-1-call-runner-4) [INFO -
org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:487)] Renaming
s3n://S3_BUCKET/flume-debug/FlumeData.1405546776595.tmp to
s3n://S3_BUCKET/flume-debug/FlumeData.1405546776595
{noformat}
Have you tried changing the jets3t version?
Are we talking about the same problem? You make reference to 2 out of 3 nodes
having problems, this should be a problem for all your nodes.
> flume-ng fails while writing to S3 sink
> ---------------------------------------
>
> Key: FLUME-1228
> URL: https://issues.apache.org/jira/browse/FLUME-1228
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.2.0, v1.4.0
> Reporter: Prashanth Jonnalagadda
> Assignee: Ashish Paliwal
> Priority: Critical
>
> flume-ng (version 1.2.0) fails while writing to S3 sink since it gets back
> 404 response code. The files with data is created on S3 though.
> Hadoop version used is 0.20.2-cdh3u4
> Followed all the steps documented in the jira -
> https://issues.cloudera.org/browse/FLUME-66
> and also I tried swapping out hadoop-core.jar that comes with CDH, with
> emr-hadoop-core-0.20.jar that comes with EC2 hadoop cluster instance as
> suggested in the following blog post -
> http://eric.lubow.org/2011/system-administration/distributed-flume-setup-with-an-s3-sink/
> but the issue still remains.
> Following errors are seen in the log:
> 2012-05-25 05:04:28,889 WARN httpclient.RestS3Service: Response
> '/flumedata%2FFlumeData.122585423857995.tmp_%24folder%24' - Unexpected
> response code 404, expected 200
> 2012-05-25 05:04:28,964 INFO s3native.NativeS3FileSystem: OutputStream for
> key 'flumedata/FlumeData.122585423857995.tmp' writing to tempfile
> '/tmp/hadoop-root/s3/output-8042215269186280519.tmp'
> 2012-05-25 05:04:28,972 INFO s3native.NativeS3FileSystem: OutputStream for
> key 'flumedata/FlumeData.122585423857995.tmp' closed. Now beginning upload
> 2012-05-25 05:04:29,044 INFO s3native.NativeS3FileSystem: OutputStream for
> key 'flumedata/FlumeData.122585423857995.tmp' upload complete
> 2012-05-25 05:04:29,074 INFO hdfs.BucketWriter: Renaming
> s3n://flume-ng/flumedata/FlumeData.122585423857995.tmp to
> s3n://flume-ng/flumedata/FlumeData.122585423857995
> 2012-05-25 05:04:29,097 WARN httpclient.RestS3Service: Response
> '/flumedata%2FFlumeData.122585423857995' - Unexpected response code 404,
> expected 200
> 2012-05-25 05:04:29,120 WARN httpclient.RestS3Service: Response
> '/flumedata%2FFlumeData.122585423857995_%24folder%24' - Unexpected response
> code 404, expected 200
> 2012-05-25 05:04:29,203 WARN httpclient.RestS3Service: Response '/flumedata'
> - Unexpected response code 404, expected 200
> 2012-05-25 05:04:29,224 WARN httpclient.RestS3Service: Response
> '/flumedata_%24folder%24' - Unexpected response code 404, expected 200
> 2012-05-25 05:04:29,608 INFO hdfs.BucketWriter: Creating
> s3n://flume-ng/flumedata/FlumeData.122585423857996.tmp
> 2012-05-25 05:04:29,720 WARN httpclient.RestS3Service: Response
> '/flumedata%2FFlumeData.122585423857996.tmp' - Unexpected response code 404,
> expected 200
> 2012-05-25 05:04:29,748 WARN httpclient.RestS3Service: Response
> '/flumedata%2FFlumeData.122585423857996.tmp_%24folder%24' - Unexpected
> response code 404, expected 200
> 2012-05-25 05:04:29,791 INFO s3native.NativeS3FileSystem: OutputStream for
> key 'flumedata/FlumeData.122585423857996.tmp' writing to tempfile
> '/tmp/hadoop-root/s3/output-2477068572058013384.tmp'
> 2012-05-25 05:04:29,793 INFO s3native.NativeS3FileSystem: OutputStream for
> key 'flumedata/FlumeData.122585423857996.tmp' closed. Now beginning upload
> 2012-05-25 05:04:29,828 INFO s3native.NativeS3FileSystem: OutputStream for
> key 'flumedata/FlumeData.122585423857996.tmp' upload complete
--
This message was sent by Atlassian JIRA
(v6.2#6252)