Hi Jessica,

Sorry for the delay. I don't know of a pre-built version of the LZO
libraries that has the fix. I also couldn't quite tell which source
versions might have it. The easiest thing to do would be to pull the
source from github, make any changes, and build it locally:

https://github.com/kevinweil/hadoop-lzo

-Joey

On Mon, Oct 10, 2011 at 7:54 PM, Jessica Owensby
<[email protected]> wrote:
> I understood the comments in the JIRA ticket to say that hadoop-lzo
> 0.4.8.jar from gerrit had the fix for
> HIVE-2395<https://issues.apache.org/jira/browse/HIVE-2395>.
>  I wasn't able to find a good version of 0.4.8 of already built (I found
> this, but there appears to be some issues with it:
> http://hadoop-gpl-packing.googlecode.com/svn-history/r18/trunk/src/main/resources/lib/hadoop-lzo-0.4.8.jar).
> And hadoop-lzo-0.4.13.jar (
> http://hadoop-gpl-packing.googlecode.com/svn-history/r39/trunk/hadoop/src/main/resources/lib/hadoop-lzo-0.4.13.jar)
> doesn't contain the fix.  Is there a version of the jar built with the
> HIVE-2395 fix?  I thought I would ask before I build it myself.
>
> Lastly, I didn't mention before that this issue appears in only one of our 2
> environments - both running cdh3u1.  I've done an number of comparisons
> between the environments and am still unable to find a dissimilarity that
> might be resulting in the 'No LZO codec found' error.  So, it
> would surprise me if we required the fix in one environment and did not in
> another -- but that may just show my lack of understanding about hadoop. :-)
>
> Jessica
>
> On Wed, Oct 5, 2011 at 4:27 PM, Jessica Owensby
> <[email protected]>wrote:
>
>> Great.  Thanks!  Will give that a try.
>> Jessica
>>
>>
>> On Wed, Oct 5, 2011 at 4:22 PM, Joey Echeverria <[email protected]> wrote:
>>
>>> It sounds like you're hitting this:
>>>
>>> https://issues.apache.org/jira/browse/HIVE-2395
>>>
>>> You might need to patch your version of DeprecatedLzoLineRecordReader
>>> to ignore the .lzo.index files.
>>>
>>> -Joey
>>>
>>> On Wed, Oct 5, 2011 at 4:13 PM, Jessica Owensby
>>> <[email protected]> wrote:
>>> > Alex,
>>> > The task trackers have been restarted many times across the cluster
>>> since
>>> > this issue was first seen.
>>> >
>>> > Hmmm, I hadn't tried to explicitly add the lzo jar to my classpath in
>>> the
>>> > hive shell, but I just tried it and got the same errors.
>>> >
>>> > Do you see
>>> >
>>> > /usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar in the child classpath
>>> when
>>> >
>>> > the task is executed (use 'ps aux' on the node)?
>>> >
>>> >
>>> > While the job wasn't running, I did this and I got back the tasktracker
>>> > process:  ps aux | grep java | grep lzo.
>>> > Do I have to run this while the task is running on that node?
>>> >
>>> > Joey,
>>> > Yes, the lzo files are indexed.  They are indexed using the following
>>> > command:
>>> >
>>> > hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-20110217.jar
>>> > com.hadoop.compression.lzo.LzoIndexer /user/hive/warehouse/foo/bar.lzo
>>> >
>>> > Jessica
>>> >
>>> > On Wed, Oct 5, 2011 at 3:52 PM, Joey Echeverria <[email protected]>
>>> wrote:
>>> >> Are your LZO files indexed?
>>> >>
>>> >> -Joey
>>> >>
>>> >> On Wed, Oct 5, 2011 at 3:35 PM, Jessica Owensby
>>> >> <[email protected]> wrote:
>>> >>> Hi Joey,
>>> >>> Thanks. I forgot to say that; yes, the lzocodec class is listed in
>>> >>> core-site.xml under the io.compression.codecs property:
>>> >>>
>>> >>> <property>
>>> >>>  <name>io.compression.codecs</name>
>>> >>>
>>> >
>>>  <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
>>> >>> </property>
>>> >>>
>>> >>> I also added the mapred.child.env property to mapred site:
>>> >>>
>>> >>>  <property>
>>> >>>    <name>mapred.child.env</name>
>>> >>>    <value>JAVA_LIBRARY_PATH=/usr/lib/hadoop-0.20/lib</value>
>>> >>>  </property>
>>> >>>
>>> >>> per these instructions:
>>> >>>
>>> >
>>> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
>>> >>>
>>> >>> After making each of these changes I have restarted the cluster --
>>> >>> just to be sure that the new changes were being picked up.
>>> >>>
>>> >>> Jessica
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Joseph Echeverria
>>> >> Cloudera, Inc.
>>> >> 443.305.9434
>>> >>
>>> >
>>> >
>>> > Adding back the email history:
>>> >
>>> > Hello Everyone,
>>> > I've been having an issue in a hadoop environment (running cdh3u1)
>>> > where any table declared in hive
>>> > with the "STORED AS INPUTFORMAT
>>> > "com.hadoop.mapred.DeprecatedLzoTextInputFormat"" directive has the
>>> > following errors when running any query against it.
>>> >
>>> > For instance, running "select count(*) from foo;" gives the following
>>> error:
>>> >
>>> > java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>>> >      at
>>> >
>>> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:306)
>>> >      at
>>> >
>>> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.next(Hadoop20SShims.java:209)
>>> >      at
>>> >
>>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:208)
>>> >      at
>>> >
>>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:193)
>>> >      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>> >      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
>>> >      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>>> >      at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>>> >      at java.security.AccessController.doPrivileged(Native Method)
>>> >      at javax.security.auth.Subject.doAs(Subject.java:396)
>>> >      at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>>> >      at org.apache.hadoop.mapred.Child.main(Child.java:264)
>>> > Caused by: java.lang.reflect.InvocationTargetException
>>> >      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> > Method)
>>> >      at
>>> >
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>> >      at
>>> >
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>> >      at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>> >      at
>>> >
>>> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:292)
>>> >      ... 11 more
>>> > Caused by: java.io.IOException: No LZO codec found, cannot run.
>>> >      at
>>> >
>>> com.hadoop.mapred.DeprecatedLzoLineRecordReader.<init>(DeprecatedLzoLineRecordReader.java:53)
>>> >      at
>>> >
>>> com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:128)
>>> >      at
>>> >
>>> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:68)
>>> >      ... 16 more
>>> >
>>> > java.io.IOException: cannot find class
>>> > com.hadoop.mapred.DeprecatedLzoTextInputFormat
>>> >      at
>>> >
>>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:406)
>>> >      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:371)
>>> >      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>>> >      at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>>> >      at java.security.AccessController.doPrivileged(Native Method)
>>> >      at javax.security.auth.Subject.doAs(Subject.java:396)
>>> >      at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>>> >      at org.apache.hadoop.mapred.Child.main(Child.java:264)
>>> >
>>> > My thought is that the hadoop-lzo-20110217.jar is not available on the
>>> > hadoop classpath.  However, the hadoop classpath commnd shows that
>>> > /usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar is in the classpath.
>>> > Additionally, across the cluster on each machine, the
>>> > hadoop-lzo-20110217.jar is present under /usr/lib/hadoop-0.20/lib/.
>>> >
>>> > The hadoop-core-0.20.2-cdh3u1.jar is also available on my hadoop
>>> classpath.
>>> >
>>> > What else can I investigate to confirm that the lzo jar is on my
>>> > classpath?  Or is this error indicative of another issue?
>>> >
>>> > Jessica
>>> >
>>>
>>>
>>>
>>> --
>>> Joseph Echeverria
>>> Cloudera, Inc.
>>> 443.305.9434
>>>
>>
>>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Reply via email to