I have been trying to copy block data from S3 using the hadoop distcp command but it doesn't seem to work. I am noticing that the distcp gets stuck in an infinite loop. I am creating an S3 URI with the AWS access key id, AWS Secret AccessKey, and bucket name (as specified in the Hadoop wiki: http://wiki.apache.org/hadoop/AmazonS3)
Example: Say I have a S3 bucket with block data called "blockDir" and I want to copy the content to my local hdfs in a top level directory called "myHadoopDir". From my hadoop home directory, I use the following command to perform the distcp: bin/hadoop distcp s3://<awsAccessKeyId>:<awsSecretAccessKey>@blockDir/ /myHadoopDir This causes the distcp to hang and the map reduce job is never started to copy the data. I am using the s3 URI scheme since the data I am trying to copy is block-based. If I try to copy data from a S3 Native FileSystem directory, it works correctly (example: bin/hadoop distcp s3n://<awsAccessKeyId>:<awsSecretAccessKey>@fileDir/ /myHadoopDir). In this case, I used the s3n URI scheme. Does anyone have an idea on why the s3 URI copy would fail?
