Dan, Can you share your error? The plain .gz files (not .tar.gz) are natively supported by Hadoop via its GzipCodec, and if you are facing an error, I believe its cause of something other than compression.
On Fri, Jul 20, 2012 at 6:14 AM, Dan Yi <d...@mediosystems.com> wrote: > i have a MR job to read file on amazon S3 and process the data on local > hdfs. the files are zipped text file as .gz. i tried to setup the job as > below but it won't work, anyone know what might be wrong? do i need to add > extra step to unzip the file first? thanks. > > String S3_LOCATION = "s3n://access_key:private_key@bucket_name" > > protected void prepareHadoopJob() throws Exception { > > this.getHadoopJob().setMapperClass(Mapper1.class); > this.getHadoopJob().setInputFormatClass(TextInputFormat.class); > > FileInputFormat.addInputPath(this.getHadoopJob(), new Path(S3_LOCATION)); > > this.getHadoopJob().setNumReduceTasks(0); > this.getHadoopJob().setOutputFormatClass(TableOutputFormat.class); > > this.getHadoopJob().getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, > myTable.getTableName()); > this.getHadoopJob().setOutputKeyClass(ImmutableBytesWritable.class); > this.getHadoopJob().setOutputValueClass(Put.class); > } > > > > > * > > Dan Yi | Software Engineer, Analytics Engineering > Medio Systems Inc | 701 Pike St. #1500 Seattle, WA 98101 > Predictive Analytics for a Connected World > * > > -- Harsh J
<<medio.gif>>