Alex, The task trackers have been restarted many times across the cluster since this issue was first seen.
Hmmm, I hadn't tried to explicitly add the lzo jar to my classpath in the hive shell, but I just tried it and got the same errors. Do you see /usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar in the child classpath when the task is executed (use 'ps aux' on the node)? While the job wasn't running, I did this and I got back the tasktracker process: ps aux | grep java | grep lzo. Do I have to run this while the task is running on that node? Joey, Yes, the lzo files are indexed. They are indexed using the following command: hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-20110217.jar com.hadoop.compression.lzo.LzoIndexer /user/hive/warehouse/foo/bar.lzo Jessica On Wed, Oct 5, 2011 at 3:52 PM, Joey Echeverria <[email protected]> wrote: > Are your LZO files indexed? > > -Joey > > On Wed, Oct 5, 2011 at 3:35 PM, Jessica Owensby > <[email protected]> wrote: >> Hi Joey, >> Thanks. I forgot to say that; yes, the lzocodec class is listed in >> core-site.xml under the io.compression.codecs property: >> >> <property> >> <name>io.compression.codecs</name> >> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value> >> </property> >> >> I also added the mapred.child.env property to mapred site: >> >> <property> >> <name>mapred.child.env</name> >> <value>JAVA_LIBRARY_PATH=/usr/lib/hadoop-0.20/lib</value> >> </property> >> >> per these instructions: >> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ >> >> After making each of these changes I have restarted the cluster -- >> just to be sure that the new changes were being picked up. >> >> Jessica >> > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 > Adding back the email history: Hello Everyone, I've been having an issue in a hadoop environment (running cdh3u1) where any table declared in hive with the "STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"" directive has the following errors when running any query against it. For instance, running "select count(*) from foo;" gives the following error: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:306) at org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.next(Hadoop20SShims.java:209) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:208) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:292) ... 11 more Caused by: java.io.IOException: No LZO codec found, cannot run. at com.hadoop.mapred.DeprecatedLzoLineRecordReader.<init>(DeprecatedLzoLineRecordReader.java:53) at com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:128) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:68) ... 16 more java.io.IOException: cannot find class com.hadoop.mapred.DeprecatedLzoTextInputFormat at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:406) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:371) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) My thought is that the hadoop-lzo-20110217.jar is not available on the hadoop classpath. However, the hadoop classpath commnd shows that /usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar is in the classpath. Additionally, across the cluster on each machine, the hadoop-lzo-20110217.jar is present under /usr/lib/hadoop-0.20/lib/. The hadoop-core-0.20.2-cdh3u1.jar is also available on my hadoop classpath. What else can I investigate to confirm that the lzo jar is on my classpath? Or is this error indicative of another issue? Jessica
