On 3 May 2012, at 23:47, Himanshu Vijay wrote: > Pedro, > > Thanks for the response. Unfortunately I am running it on in-house cluster > and from there I need to upload to S3. >
Hi, Last night I was thinking about this... what happens if you copy s3://region.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar to your cluster and run hadoop jar s3distcp.jar --src hdfs:///path/to/files --dest s3://bucket/path --outputCodec lzo (or what have you) ? Alternatively, you could run the following Pig or Hive jobs (using output compression): --- pig --- local_data = load '/path/to/files' as ( ... ); store local_data into 's3://bucket/path' using ...; --- hive --- create external table foo ( ... ) [row format ... | serde] location '/path/to/files'; create external table s3_foo ( ... ) [row format ... | serde] location 's3://bucket/path'; insert overwrite table s3_foo select * from foo; Obviously an equivalent Native or Streaming job is trivial to write, too. Cheers, Pedro Figueiredo Skype: pfig.89clouds http://89clouds.com/ - Big Data Consulting