Hi all,

I have a few large files (4 that are 1.8GB+) I'm trying to copy from HDFS to S3. My micro EC2 cluster is running Hadoop 0.19.1, and has one master/two slaves.

I first tried using the hadoop fs -cp command, as in:

hadoop fs -cp output/<dir>/ s3n://<bucket>/<dir>/

This seemed to be working, as I could walk the network traffic spike, and temp files were being created in S3 (as seen with CyberDuck).

But then it seemed to hang. Nothing happened for 30 minutes, so I killed the command.

Then I tried using the hadoop distcp command, as in:

hadoop distcp hdfs://<host>:50001/<path>/<dir>/ s3://<public key>:<private key>@<bucket>/<dir2>/

This failed, because my secret key has a '/' in it (http://issues.apache.org/jira/browse/HADOOP-3733)

Then I tried using hadoop distcp with the s3n URI syntax:

hadoop distcp hdfs://<host>:50001/<path>/<dir>/ s3n://<bucket>/<dir2>/

Similar to my first attempt, it seemed to work. Lots of network activity, temp files being created, and in the terminal I got:

09/05/07 18:36:11 INFO mapred.JobClient: Running job: job_200905071339_0004
09/05/07 18:36:12 INFO mapred.JobClient:  map 0% reduce 0%
09/05/07 18:36:30 INFO mapred.JobClient:  map 9% reduce 0%
09/05/07 18:36:35 INFO mapred.JobClient:  map 14% reduce 0%
09/05/07 18:36:38 INFO mapred.JobClient:  map 20% reduce 0%

But again it hung. No network traffic, and eventually it dumped out:

09/05/07 18:52:34 INFO mapred.JobClient: Task Id : attempt_200905071339_0004_m_000001_0, Status : FAILED Task attempt_200905071339_0004_m_000001_0 failed to report status for 601 seconds. Killing! 09/05/07 18:53:02 INFO mapred.JobClient: Task Id : attempt_200905071339_0004_m_000004_0, Status : FAILED Task attempt_200905071339_0004_m_000004_0 failed to report status for 602 seconds. Killing! 09/05/07 18:53:06 INFO mapred.JobClient: Task Id : attempt_200905071339_0004_m_000002_0, Status : FAILED Task attempt_200905071339_0004_m_000002_0 failed to report status for 602 seconds. Killing! 09/05/07 18:53:09 INFO mapred.JobClient: Task Id : attempt_200905071339_0004_m_000003_0, Status : FAILED Task attempt_200905071339_0004_m_000003_0 failed to report status for 601 seconds. Killing!

In the task GUI, I can see the same tasks failing, and being restarted. But the restarted tasks seem to be just hanging w/o doing anything.

Eventually one of the tasks made a bit more progress, but then it finally died with:

Copy failed: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
        at org.apache.hadoop.tools.DistCp.copy(DistCp.java:647)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:844)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:871)

So - any thoughts on what's going wrong?

Thanks,

-- Ken
--
Ken Krugler
+1 530-210-6378

Reply via email to