Hi all,
I have a few large files (4 that are 1.8GB+) I'm trying to copy from
HDFS to S3. My micro EC2 cluster is running Hadoop 0.19.1, and has
one master/two slaves.
I first tried using the hadoop fs -cp command, as in:
hadoop fs -cp output/<dir>/ s3n://<bucket>/<dir>/
This seemed to be working, as I could walk the network traffic spike,
and temp files were being created in S3 (as seen with CyberDuck).
But then it seemed to hang. Nothing happened for 30 minutes, so I
killed the command.
Then I tried using the hadoop distcp command, as in:
hadoop distcp hdfs://<host>:50001/<path>/<dir>/ s3://<public
key>:<private key>@<bucket>/<dir2>/
This failed, because my secret key has a '/' in it
(http://issues.apache.org/jira/browse/HADOOP-3733)
Then I tried using hadoop distcp with the s3n URI syntax:
hadoop distcp hdfs://<host>:50001/<path>/<dir>/ s3n://<bucket>/<dir2>/
Similar to my first attempt, it seemed to work. Lots of network
activity, temp files being created, and in the terminal I got:
09/05/07 18:36:11 INFO mapred.JobClient: Running job: job_200905071339_0004
09/05/07 18:36:12 INFO mapred.JobClient: map 0% reduce 0%
09/05/07 18:36:30 INFO mapred.JobClient: map 9% reduce 0%
09/05/07 18:36:35 INFO mapred.JobClient: map 14% reduce 0%
09/05/07 18:36:38 INFO mapred.JobClient: map 20% reduce 0%
But again it hung. No network traffic, and eventually it dumped out:
09/05/07 18:52:34 INFO mapred.JobClient: Task Id :
attempt_200905071339_0004_m_000001_0, Status : FAILED
Task attempt_200905071339_0004_m_000001_0 failed to report status for
601 seconds. Killing!
09/05/07 18:53:02 INFO mapred.JobClient: Task Id :
attempt_200905071339_0004_m_000004_0, Status : FAILED
Task attempt_200905071339_0004_m_000004_0 failed to report status for
602 seconds. Killing!
09/05/07 18:53:06 INFO mapred.JobClient: Task Id :
attempt_200905071339_0004_m_000002_0, Status : FAILED
Task attempt_200905071339_0004_m_000002_0 failed to report status for
602 seconds. Killing!
09/05/07 18:53:09 INFO mapred.JobClient: Task Id :
attempt_200905071339_0004_m_000003_0, Status : FAILED
Task attempt_200905071339_0004_m_000003_0 failed to report status for
601 seconds. Killing!
In the task GUI, I can see the same tasks failing, and being
restarted. But the restarted tasks seem to be just hanging w/o doing
anything.
Eventually one of the tasks made a bit more progress, but then it
finally died with:
Copy failed: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:647)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:844)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:871)
So - any thoughts on what's going wrong?
Thanks,
-- Ken
--
Ken Krugler
+1 530-210-6378