Re: cannot find DeprecatedLzoTextInputFormat

Jessica Owensby Mon, 10 Oct 2011 11:55:12 -0700

I understood the comments in the JIRA ticket to say that hadoop-lzo
0.4.8.jar from gerrit had the fix for
HIVE-2395<https://issues.apache.org/jira/browse/HIVE-2395>.
 I wasn't able to find a good version of 0.4.8 of already built (I found
this, but there appears to be some issues with it:
http://hadoop-gpl-packing.googlecode.com/svn-history/r18/trunk/src/main/resources/lib/hadoop-lzo-0.4.8.jar).
And hadoop-lzo-0.4.13.jar (
http://hadoop-gpl-packing.googlecode.com/svn-history/r39/trunk/hadoop/src/main/resources/lib/hadoop-lzo-0.4.13.jar)
doesn't contain the fix.  Is there a version of the jar built with the
HIVE-2395 fix?  I thought I would ask before I build it myself.


Lastly, I didn't mention before that this issue appears in only one of our 2
environments - both running cdh3u1.  I've done an number of comparisons
between the environments and am still unable to find a dissimilarity that
might be resulting in the 'No LZO codec found' error.  So, it
would surprise me if we required the fix in one environment and did not in
another -- but that may just show my lack of understanding about hadoop. :-)

Jessica

On Wed, Oct 5, 2011 at 4:27 PM, Jessica Owensby
<[email protected]>wrote:

> Great.  Thanks!  Will give that a try.
> Jessica
>
>
> On Wed, Oct 5, 2011 at 4:22 PM, Joey Echeverria <[email protected]> wrote:
>
>> It sounds like you're hitting this:
>>
>> https://issues.apache.org/jira/browse/HIVE-2395
>>
>> You might need to patch your version of DeprecatedLzoLineRecordReader
>> to ignore the .lzo.index files.
>>
>> -Joey
>>
>> On Wed, Oct 5, 2011 at 4:13 PM, Jessica Owensby
>> <[email protected]> wrote:
>> > Alex,
>> > The task trackers have been restarted many times across the cluster
>> since
>> > this issue was first seen.
>> >
>> > Hmmm, I hadn't tried to explicitly add the lzo jar to my classpath in
>> the
>> > hive shell, but I just tried it and got the same errors.
>> >
>> > Do you see
>> >
>> > /usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar in the child classpath
>> when
>> >
>> > the task is executed (use 'ps aux' on the node)?
>> >
>> >
>> > While the job wasn't running, I did this and I got back the tasktracker
>> > process:  ps aux | grep java | grep lzo.
>> > Do I have to run this while the task is running on that node?
>> >
>> > Joey,
>> > Yes, the lzo files are indexed.  They are indexed using the following
>> > command:
>> >
>> > hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-20110217.jar
>> > com.hadoop.compression.lzo.LzoIndexer /user/hive/warehouse/foo/bar.lzo
>> >
>> > Jessica
>> >
>> > On Wed, Oct 5, 2011 at 3:52 PM, Joey Echeverria <[email protected]>
>> wrote:
>> >> Are your LZO files indexed?
>> >>
>> >> -Joey
>> >>
>> >> On Wed, Oct 5, 2011 at 3:35 PM, Jessica Owensby
>> >> <[email protected]> wrote:
>> >>> Hi Joey,
>> >>> Thanks. I forgot to say that; yes, the lzocodec class is listed in
>> >>> core-site.xml under the io.compression.codecs property:
>> >>>
>> >>> <property>
>> >>>  <name>io.compression.codecs</name>
>> >>>
>> >
>>  
>> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
>> >>> </property>
>> >>>
>> >>> I also added the mapred.child.env property to mapred site:
>> >>>
>> >>>  <property>
>> >>>    <name>mapred.child.env</name>
>> >>>    <value>JAVA_LIBRARY_PATH=/usr/lib/hadoop-0.20/lib</value>
>> >>>  </property>
>> >>>
>> >>> per these instructions:
>> >>>
>> >
>> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
>> >>>
>> >>> After making each of these changes I have restarted the cluster --
>> >>> just to be sure that the new changes were being picked up.
>> >>>
>> >>> Jessica
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Joseph Echeverria
>> >> Cloudera, Inc.
>> >> 443.305.9434
>> >>
>> >
>> >
>> > Adding back the email history:
>> >
>> > Hello Everyone,
>> > I've been having an issue in a hadoop environment (running cdh3u1)
>> > where any table declared in hive
>> > with the "STORED AS INPUTFORMAT
>> > "com.hadoop.mapred.DeprecatedLzoTextInputFormat"" directive has the
>> > following errors when running any query against it.
>> >
>> > For instance, running "select count(*) from foo;" gives the following
>> error:
>> >
>> > java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>> >      at
>> >
>> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:306)
>> >      at
>> >
>> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.next(Hadoop20SShims.java:209)
>> >      at
>> >
>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:208)
>> >      at
>> >
>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:193)
>> >      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>> >      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
>> >      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>> >      at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>> >      at java.security.AccessController.doPrivileged(Native Method)
>> >      at javax.security.auth.Subject.doAs(Subject.java:396)
>> >      at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>> >      at org.apache.hadoop.mapred.Child.main(Child.java:264)
>> > Caused by: java.lang.reflect.InvocationTargetException
>> >      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> > Method)
>> >      at
>> >
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>> >      at
>> >
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>> >      at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>> >      at
>> >
>> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:292)
>> >      ... 11 more
>> > Caused by: java.io.IOException: No LZO codec found, cannot run.
>> >      at
>> >
>> com.hadoop.mapred.DeprecatedLzoLineRecordReader.<init>(DeprecatedLzoLineRecordReader.java:53)
>> >      at
>> >
>> com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:128)
>> >      at
>> >
>> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:68)
>> >      ... 16 more
>> >
>> > java.io.IOException: cannot find class
>> > com.hadoop.mapred.DeprecatedLzoTextInputFormat
>> >      at
>> >
>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:406)
>> >      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:371)
>> >      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>> >      at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>> >      at java.security.AccessController.doPrivileged(Native Method)
>> >      at javax.security.auth.Subject.doAs(Subject.java:396)
>> >      at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>> >      at org.apache.hadoop.mapred.Child.main(Child.java:264)
>> >
>> > My thought is that the hadoop-lzo-20110217.jar is not available on the
>> > hadoop classpath.  However, the hadoop classpath commnd shows that
>> > /usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar is in the classpath.
>> > Additionally, across the cluster on each machine, the
>> > hadoop-lzo-20110217.jar is present under /usr/lib/hadoop-0.20/lib/.
>> >
>> > The hadoop-core-0.20.2-cdh3u1.jar is also available on my hadoop
>> classpath.
>> >
>> > What else can I investigate to confirm that the lzo jar is on my
>> > classpath?  Or is this error indicative of another issue?
>> >
>> > Jessica
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>
>

Re: cannot find DeprecatedLzoTextInputFormat

Reply via email to