Re: Errors reading lzo-compressed files from Hadoop

Dmitriy Ryaboy Thu, 08 Apr 2010 11:28:06 -0700

Both Kevin's and Todd's branches now pass my tests. Thanks again Todd.

-D


On Thu, Apr 8, 2010 at 10:46 AM, Todd Lipcon <[email protected]> wrote:
> OK, fixed, unit tests passing again. If anyone sees any more problems let
> one of us know!
>
> Thanks
> -Todd
>
> On Thu, Apr 8, 2010 at 10:39 AM, Todd Lipcon <[email protected]> wrote:
>
>> Doh, a couple more silly bugs in there. Don't use that version quite yet -
>> I'll put up a better patch later today. (Thanks to Kevin and Ted Yu for
>> pointing out the additional problems)
>>
>> -Todd
>>
>>
>> On Wed, Apr 7, 2010 at 5:24 PM, Todd Lipcon <[email protected]> wrote:
>>
>>> For Dmitriy and anyone else who has seen this error, I just committed a
>>> fix to my github repository:
>>>
>>>
>>> http://github.com/toddlipcon/hadoop-lzo/commit/f3bc3f8d003bb8e24f254b25bca2053f731cdd58
>>>
>>> The problem turned out to be an assumption that InputStream.read() would
>>> return all the bytes that were asked for. This turns out to almost always be
>>> true on local filesystems, but on HDFS it's not true if the read crosses a
>>> block boundary. So, every couple of TB of lzo compressed data one might see
>>> this error.
>>>
>>> Big thanks to Alex Roetter who was able to provide a file that exhibited
>>> the bug!
>>>
>>> Thanks
>>> -Todd
>>>
>>>
>>> On Tue, Apr 6, 2010 at 10:35 AM, Todd Lipcon <[email protected]> wrote:
>>>
>>>> Hi Alex,
>>>> Unfortunately I wasn't able to reproduce, and the data Dmitriy is
>>>> working with is sensitive.
>>>> Do you have some data you could upload (or send me off list) that
>>>> exhibits the issue?
>>>> -Todd
>>>>
>>>> On Tue, Apr 6, 2010 at 9:50 AM, Alex Roetter <[email protected]>
>>>> wrote:
>>>> >
>>>> > Todd Lipcon <t...@...> writes:
>>>> >
>>>> > >
>>>> > > Hey Dmitriy,
>>>> > >
>>>> > > This is very interesting (and worrisome in a way!) I'll try to take a
>>>> look
>>>> > > this afternoon.
>>>> > >
>>>> > > -Todd
>>>> > >
>>>> >
>>>> > Hi Todd,
>>>> >
>>>> > I wanted to see if you made any progress on this front. I'm seeing a
>>>> very
>>>> > similar error, trying to run a MR (Hadoop 0.20.1) over a bunch of
>>>> > LZOP compressed / indexed files (using Kevin Weil's package), and I
>>>> have one
>>>> > map task that always fails in what looks like the same place as
>>>> described in
>>>> > the previous post. I haven't yet done the experimentation mentioned
>>>> above
>>>> > (isolating the input file corresponding to the failed map task,
>>>> decompressing
>>>> > it / recompressing it, testing it out operating directly on local disk
>>>> > instead of HDFS, etc).
>>>> >
>>>> > However, since I am crashing in exactly the same place it seems likely
>>>> this
>>>> > is related, and thought I'd check on your work in the meantime.
>>>> >
>>>> > FYI, my stack track is below:
>>>> >
>>>> > 2010-04-05 18:15:16,895 FATAL org.apache.hadoop.mapred.TaskTracker:
>>>> Error
>>>> > running child : java.lang.InternalError: lzo1x_decompress_safe
>>>> returned:
>>>> >        at
>>>> com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect
>>>> > (Native Method)
>>>> >        at com.hadoop.compression.lzo.LzoDecompressor.decompress
>>>> > (LzoDecompressor.java:303)
>>>> >        at
>>>> > com.hadoop.compression.lzo.LzopDecompressor.decompress
>>>> > (LzopDecompressor.java:104)
>>>> >        at com.hadoop.compression.lzo.LzopInputStream.decompress
>>>> > (LzopInputStream.java:223)
>>>> >        at
>>>> > org.apache.hadoop.io.compress.DecompressorStream.read
>>>> > (DecompressorStream.java:74)
>>>> >        at java.io.InputStream.read(InputStream.java:85)
>>>> >        at
>>>> org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
>>>> >        at
>>>> org.apache.hadoop.util.LineReader.readLine(LineReader.java:187)
>>>> >        at
>>>> > com.hadoop.mapreduce.LzoLineRecordReader.nextKeyValue
>>>> > (LzoLineRecordReader.java:126)
>>>> >        at
>>>> > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue
>>>> > (MapTask.java:423)
>>>> >        at
>>>> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>>>> >        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>>>> >        at
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>> >        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>> >
>>>> >
>>>> > Any update much appreciated,
>>>> > Alex
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Errors reading lzo-compressed files from Hadoop

Reply via email to