Re: use S3 as input to MR job

Harsh J Thu, 19 Jul 2012 18:34:12 -0700

Dan,

Can you share your error? The plain .gz files (not .tar.gz) are natively
supported by Hadoop via its GzipCodec, and if you are facing an error, I
believe its cause of something other than compression.


On Fri, Jul 20, 2012 at 6:14 AM, Dan Yi <d...@mediosystems.com> wrote:

> i have a MR job to read file on amazon S3 and process the data on local
> hdfs. the files are zipped text file as .gz. i tried to setup the job as
> below but it won't work, anyone know what might be wrong? do i need to add
> extra step to unzip the file first? thanks.
>
> String S3_LOCATION = "s3n://access_key:private_key@bucket_name"
>
> protected void prepareHadoopJob() throws Exception {
>
>     this.getHadoopJob().setMapperClass(Mapper1.class);
>     this.getHadoopJob().setInputFormatClass(TextInputFormat.class);
>
>     FileInputFormat.addInputPath(this.getHadoopJob(), new Path(S3_LOCATION));
>
>     this.getHadoopJob().setNumReduceTasks(0);
>     this.getHadoopJob().setOutputFormatClass(TableOutputFormat.class);
>     
> this.getHadoopJob().getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, 
> myTable.getTableName());
>     this.getHadoopJob().setOutputKeyClass(ImmutableBytesWritable.class);
>     this.getHadoopJob().setOutputValueClass(Put.class);
> }
>
>
>
>
> *
>
> Dan Yi | Software Engineer, Analytics Engineering
>   Medio Systems Inc | 701 Pike St. #1500 Seattle, WA 98101
> Predictive Analytics for a Connected World
>  *
>
>


-- 
Harsh J

<<medio.gif>>

Re: use S3 as input to MR job

Reply via email to