Re: How to ensure LzoTextInputFormat is used to generate input splits for .lzo files

Kevin Weil Thu, 31 Dec 2009 15:45:27 -0800

Steve, glad you got it figured out.  Interested to hear how it goes, and of
course feel free to post bugs/requests to the github page
www.github.com/kevinweil/hadoop-lzo.


Kevin

On Thu, Dec 31, 2009 at 12:21 PM, Steve Kuo <[email protected]> wrote:

> Digging around the new Job api with a rested brain came up with
>
>             job.setInputFormatClass(LzoTextInputFormat.class);
>
> that solved the problem.
>
> On Thu, Dec 31, 2009 at 9:53 AM, Steve Kuo <[email protected]> wrote:
>
> > I have followed
> >
> http://www.cloudera.com/blog/2009/11/17/hadoop-at-twitter-part-1-splittable-lzo-compression/and
> > http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ to build the
> > requisite hadoop-lzo jar and native .so files.  (The jar and .so files
> were
> > built from Kevin Weil's git repository.  Thanks Kevin.)  I have
> configured
> > core-site.xml and mapred-site.xml as instructed to enable lzo for both
> map
> > and reduce output.  The creation of lzo index also worked. The last step
> was
> > to replace TextInputFormat with LzoTextInputFormat.  As I only have
> >
> >     FileInputFormat.addInputPath(jobConf, new Path(inputPath));
> >
> > it was replaced with
> >
> >      LzoTextInputFormat.addInputPath(job, new Path(inputPath));
> >
> > When I ran my MR job, I noticed that the new code was able to read in
> .lzo
> > input files and decompressed fine.   The output was also lzo compressed.
> > However, only one map job was created for each input .lzo file indicating
> > that input splitting was not done by LzoTextInputFormat but more likely
> by
> > its parent such as FileInputFormat.  There must be a way to ensure
> > LzoTextInputFormat is used in the Map task.  How can this be done?
> >
> > Thanks in advance.
> >
> >
>

Re: How to ensure LzoTextInputFormat is used to generate input splits for .lzo files

Reply via email to