Harsh, your comment just saved me from several wasteful hours of aimless labor. I added LzoCodec in core-site.xml. But I forgot to add LzopCodec. Now it works all good. Thanks for the reply!!!
Regards, Ed 2012/1/2 Harsh J <[email protected]> > Hello Edward, > > On Mon, Jan 2, 2012 at 11:04 AM, edward choi <[email protected]> wrote: > > Hi, > > > > I'm having trouble trying to handle lzo compressed files. > > The input files are compressed by LzopCodec provided by hadoop-lzo > package. > > And I am using Cloudera 3 update 2 version Hadoop. > > > > I don't need to split the input file, so there is no need telling me to > > index the input file and to use LzoTextInputFormat, unless that is the > only > > way to handle lzo-compressed files. > > Its OK to use LZO without splitting. There are no issues in doing that. > > > I thought all I needed to do was set the job input format as > > "TextInputFormat" and hadoop will take care of the rest. > > When I do that, I don't get any error messages but log files tell me that > > input files are not decompressed at all. Input files are being handled as > > raw text files. > > By 'Input files are being handled as raw text files.' I assume you > mean that your mappers are receiving garbage (compressed) input, > without being decoded? > > Have you ensured that your io.compression.codecs property in > core-site.xml carries LzoCodec and LzopCodec canonical classnames, and > that your MR cluster was restarted with this change added? > > > Is there a specific way to read files with lzo extension? > > The above config registers ".lzo" look-outs and auto-detection of LZO > files so you shouldn't need an explicit way. > > -- > Harsh J >
