Hi Jessica, Sorry for the delay. I don't know of a pre-built version of the LZO libraries that has the fix. I also couldn't quite tell which source versions might have it. The easiest thing to do would be to pull the source from github, make any changes, and build it locally:
https://github.com/kevinweil/hadoop-lzo -Joey On Mon, Oct 10, 2011 at 7:54 PM, Jessica Owensby <[email protected]> wrote: > I understood the comments in the JIRA ticket to say that hadoop-lzo > 0.4.8.jar from gerrit had the fix for > HIVE-2395<https://issues.apache.org/jira/browse/HIVE-2395>. > I wasn't able to find a good version of 0.4.8 of already built (I found > this, but there appears to be some issues with it: > http://hadoop-gpl-packing.googlecode.com/svn-history/r18/trunk/src/main/resources/lib/hadoop-lzo-0.4.8.jar). > And hadoop-lzo-0.4.13.jar ( > http://hadoop-gpl-packing.googlecode.com/svn-history/r39/trunk/hadoop/src/main/resources/lib/hadoop-lzo-0.4.13.jar) > doesn't contain the fix. Is there a version of the jar built with the > HIVE-2395 fix? I thought I would ask before I build it myself. > > Lastly, I didn't mention before that this issue appears in only one of our 2 > environments - both running cdh3u1. I've done an number of comparisons > between the environments and am still unable to find a dissimilarity that > might be resulting in the 'No LZO codec found' error. So, it > would surprise me if we required the fix in one environment and did not in > another -- but that may just show my lack of understanding about hadoop. :-) > > Jessica > > On Wed, Oct 5, 2011 at 4:27 PM, Jessica Owensby > <[email protected]>wrote: > >> Great. Thanks! Will give that a try. >> Jessica >> >> >> On Wed, Oct 5, 2011 at 4:22 PM, Joey Echeverria <[email protected]> wrote: >> >>> It sounds like you're hitting this: >>> >>> https://issues.apache.org/jira/browse/HIVE-2395 >>> >>> You might need to patch your version of DeprecatedLzoLineRecordReader >>> to ignore the .lzo.index files. >>> >>> -Joey >>> >>> On Wed, Oct 5, 2011 at 4:13 PM, Jessica Owensby >>> <[email protected]> wrote: >>> > Alex, >>> > The task trackers have been restarted many times across the cluster >>> since >>> > this issue was first seen. >>> > >>> > Hmmm, I hadn't tried to explicitly add the lzo jar to my classpath in >>> the >>> > hive shell, but I just tried it and got the same errors. >>> > >>> > Do you see >>> > >>> > /usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar in the child classpath >>> when >>> > >>> > the task is executed (use 'ps aux' on the node)? >>> > >>> > >>> > While the job wasn't running, I did this and I got back the tasktracker >>> > process: ps aux | grep java | grep lzo. >>> > Do I have to run this while the task is running on that node? >>> > >>> > Joey, >>> > Yes, the lzo files are indexed. They are indexed using the following >>> > command: >>> > >>> > hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-20110217.jar >>> > com.hadoop.compression.lzo.LzoIndexer /user/hive/warehouse/foo/bar.lzo >>> > >>> > Jessica >>> > >>> > On Wed, Oct 5, 2011 at 3:52 PM, Joey Echeverria <[email protected]> >>> wrote: >>> >> Are your LZO files indexed? >>> >> >>> >> -Joey >>> >> >>> >> On Wed, Oct 5, 2011 at 3:35 PM, Jessica Owensby >>> >> <[email protected]> wrote: >>> >>> Hi Joey, >>> >>> Thanks. I forgot to say that; yes, the lzocodec class is listed in >>> >>> core-site.xml under the io.compression.codecs property: >>> >>> >>> >>> <property> >>> >>> <name>io.compression.codecs</name> >>> >>> >>> > >>> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value> >>> >>> </property> >>> >>> >>> >>> I also added the mapred.child.env property to mapred site: >>> >>> >>> >>> <property> >>> >>> <name>mapred.child.env</name> >>> >>> <value>JAVA_LIBRARY_PATH=/usr/lib/hadoop-0.20/lib</value> >>> >>> </property> >>> >>> >>> >>> per these instructions: >>> >>> >>> > >>> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ >>> >>> >>> >>> After making each of these changes I have restarted the cluster -- >>> >>> just to be sure that the new changes were being picked up. >>> >>> >>> >>> Jessica >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> Joseph Echeverria >>> >> Cloudera, Inc. >>> >> 443.305.9434 >>> >> >>> > >>> > >>> > Adding back the email history: >>> > >>> > Hello Everyone, >>> > I've been having an issue in a hadoop environment (running cdh3u1) >>> > where any table declared in hive >>> > with the "STORED AS INPUTFORMAT >>> > "com.hadoop.mapred.DeprecatedLzoTextInputFormat"" directive has the >>> > following errors when running any query against it. >>> > >>> > For instance, running "select count(*) from foo;" gives the following >>> error: >>> > >>> > java.lang.RuntimeException: java.lang.reflect.InvocationTargetException >>> > at >>> > >>> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:306) >>> > at >>> > >>> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.next(Hadoop20SShims.java:209) >>> > at >>> > >>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:208) >>> > at >>> > >>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:193) >>> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) >>> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) >>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) >>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) >>> > at java.security.AccessController.doPrivileged(Native Method) >>> > at javax.security.auth.Subject.doAs(Subject.java:396) >>> > at >>> > >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) >>> > at org.apache.hadoop.mapred.Child.main(Child.java:264) >>> > Caused by: java.lang.reflect.InvocationTargetException >>> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>> > Method) >>> > at >>> > >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) >>> > at >>> > >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) >>> > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) >>> > at >>> > >>> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:292) >>> > ... 11 more >>> > Caused by: java.io.IOException: No LZO codec found, cannot run. >>> > at >>> > >>> com.hadoop.mapred.DeprecatedLzoLineRecordReader.<init>(DeprecatedLzoLineRecordReader.java:53) >>> > at >>> > >>> com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:128) >>> > at >>> > >>> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:68) >>> > ... 16 more >>> > >>> > java.io.IOException: cannot find class >>> > com.hadoop.mapred.DeprecatedLzoTextInputFormat >>> > at >>> > >>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:406) >>> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:371) >>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) >>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) >>> > at java.security.AccessController.doPrivileged(Native Method) >>> > at javax.security.auth.Subject.doAs(Subject.java:396) >>> > at >>> > >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) >>> > at org.apache.hadoop.mapred.Child.main(Child.java:264) >>> > >>> > My thought is that the hadoop-lzo-20110217.jar is not available on the >>> > hadoop classpath. However, the hadoop classpath commnd shows that >>> > /usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar is in the classpath. >>> > Additionally, across the cluster on each machine, the >>> > hadoop-lzo-20110217.jar is present under /usr/lib/hadoop-0.20/lib/. >>> > >>> > The hadoop-core-0.20.2-cdh3u1.jar is also available on my hadoop >>> classpath. >>> > >>> > What else can I investigate to confirm that the lzo jar is on my >>> > classpath? Or is this error indicative of another issue? >>> > >>> > Jessica >>> > >>> >>> >>> >>> -- >>> Joseph Echeverria >>> Cloudera, Inc. >>> 443.305.9434 >>> >> >> > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
