Hello Edward, On Mon, Jan 2, 2012 at 11:04 AM, edward choi <[email protected]> wrote: > Hi, > > I'm having trouble trying to handle lzo compressed files. > The input files are compressed by LzopCodec provided by hadoop-lzo package. > And I am using Cloudera 3 update 2 version Hadoop. > > I don't need to split the input file, so there is no need telling me to > index the input file and to use LzoTextInputFormat, unless that is the only > way to handle lzo-compressed files.
Its OK to use LZO without splitting. There are no issues in doing that. > I thought all I needed to do was set the job input format as > "TextInputFormat" and hadoop will take care of the rest. > When I do that, I don't get any error messages but log files tell me that > input files are not decompressed at all. Input files are being handled as > raw text files. By 'Input files are being handled as raw text files.' I assume you mean that your mappers are receiving garbage (compressed) input, without being decoded? Have you ensured that your io.compression.codecs property in core-site.xml carries LzoCodec and LzopCodec canonical classnames, and that your MR cluster was restarted with this change added? > Is there a specific way to read files with lzo extension? The above config registers ".lzo" look-outs and auto-detection of LZO files so you shouldn't need an explicit way. -- Harsh J
