Thank you! Pig is now successfully finding the LZO libraries.
I created a pig-env.sh file in $PIG_HOME/conf (it didn't already exist) Then added the line "export PIG_OPTS="$PIG_OPTS -Djava.library.path=/usr/lib/hadoop/lib/native/Linux-amd64" ~Ed On Wed, Sep 22, 2010 at 6:36 PM, Gerrit van Vuuren < gvanvuu...@specificmedia.com> wrote: > Hi, > > You also need to add the java.library.path to pig opts in following to > $PIG_HOME/bin/pig > > E.g: > PIG_OPTS="$PIG_OPTS -Djava.library.path=/opt/hadoop/lib/native/Linux-amd64" > > , > > cheers. > > > > ----- Original Message ----- > From: pig <hadoopn...@gmail.com> > To: pig-user@hadoop.apache.org <pig-user@hadoop.apache.org> > Sent: Wed Sep 22 23:25:58 2010 > Subject: Re: Does Pig 0.7 support indexed LZO files? If not, does > elephant-pig work with 0.7? > > Hi Dimitry, > > Using the REGISTER pig keyword got rid of the missing class error. Thank > you! > > I still have the error regarding the lzo codec missing. > > I followed all the steps outlined by Gerrit and LZO works without any > problems when I'm using it in java based map-reduce programs (including > outputting compressed lzo files). However, for some reason I still have > the > problem with Pig. I added the hadoop-kevinweil-gpl-compression.jar to my > $PIG_HOME/lib directory on all nodes and machine I'm running pig from. THe > native libraries are also in the correct location in the > hadoop/lib/native/Linux-amd64 folder (libgplcompression.so and > libhadoop.so.1.0.0) > > I'm assuming that pig will pick up the JAVA_LIBRARY_PATH variable set in > hadoop-env.sh. Is that correct? Thank you! > > ~Ed > > On Wed, Sep 22, 2010 at 5:44 PM, Dmitriy Ryaboy <dvrya...@gmail.com> > wrote: > > > By register I mean the pig register keyword. > > > > So, in addition to > > > > REGISTER elephant-bird-1.0.jar > > > > you should also > > > > REGISTER /usr/lib/elephant-pig/lib/google-collections-1.0.jar > > > > and possibly the rest of the jars in that directory. Might be simpler to > > jar > > them up together and just register a single jar. > > > > > > -D > > > > On Wed, Sep 22, 2010 at 1:47 PM, pig <hadoopn...@gmail.com> wrote: > > > > > I added the jars to all my nodes in /usr/lib/elephant-pig/lib > > > > > > I then modified hadoop-env.sh for all nodes so that it includes the > entry > > > > > > export PIG_CLASSPATH=/usr/lib/elephant-pig/lib/*:$PIG_CLASSPATH > > > > > > I start up the grunt shell and first past the line: > > > > > > REGISTER elephant-bird-1.0.jar > > > > > > This has no problems. Then I add the line: > > > > > > A = LOAD '/user/foo/input' USING > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('|'); > > > > > > At this point the following error prints to screen: > > > > > > -------------------- > > > [main] ERROR com.hadoop.compression.lzo.GPLNativeCodeLoader - Could not > > > load > > > native gpl library > > > java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path > > > ... > > > [main] ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load > native-lzo > > > without native-hadoop > > > -------------------- > > > > > > No log entry is generated and the grunt shell continues to work. (LZO > > > works > > > fine with when I run java based map-reduce programs). I then add the > > final > > > 2 > > > lines of the pig script: > > > > > > B=LIMIT A 100; > > > DUMP B; > > > > > > The program starts to execute and fails. The nodes running the mapper > > give > > > the error java.lang.ClassNotFoundException: > > com.google.common.collect.Maps > > > and fails. (This was the same error I was getting before in my pig log > > > files). The class not found exception no longer shows up in my pig log > > > file. In its place is a more generic RunTimeException. > > > > > > On all nodes I also tried > > > > > > export PIG_CLASSPATH=/usr/lib/elephant-pig/lib:$PIG_CLASSPATH > > > > > > (without the *) > > > > > > and I also tried modifying JAVA_LIBRARY_PATH to include the location of > > the > > > elephant-pig jar files. > > > > > > I'm using the cloudera distro of Hadoop 0.20.2 if that might someone be > > > causing problems. When you said I might need to "register" the jar > files > > > was does that mean exactly? Thanks again for all your assistance and > > > prompt > > > responses. > > > > > > ~Ed > > > > > > On Wed, Sep 22, 2010 at 3:46 PM, pig <hadoopn...@gmail.com> wrote: > > > > > > > Ah, > > > > > > > > I didn't realize I need to put the jars on all the nodes since the > > error > > > is > > > > being thrown before the pig script actually executes (it's throwing > the > > > > error in the parsing stage). I assumed since the pig script hasn't > > > executed > > > > yet it wasn't doing anything with the Hadoop nodes. > > > > > > > > I will try adding PIG_CLASSPATH to my hadoop-env.sh and will then put > > the > > > > jar files on all the slave nodes. Hopefully that will solve the > > problem. > > > > > > > > ~Ed > > > > > > > > > > > > On Wed, Sep 22, 2010 at 3:28 PM, Dmitriy Ryaboy <dvrya...@gmail.com > > > >wrote: > > > > > > > >> try PIG_CLASSPATH > > > >> > > > >> Oh and you might need to explicitly register them.. sorry, forgot. > We > > > just > > > >> have them on the hadoop classpath on the nodes themselves, so we > don't > > > >> have > > > >> to do that, but you might if you are starting fresh. > > > >> > > > >> -D > > > >> > > > >> On Wed, Sep 22, 2010 at 12:01 PM, pig <hadoopn...@gmail.com> wrote: > > > >> > > > >> > [foo]$ echo $CLASSPATH > > > >> > :/usr/lib/elephant-bird/lib/* > > > >> > > > > >> > This has been set for both user foo and hadoop but I still get the > > > same > > > >> > error. Is this the correct environment variable to be setting? > > > >> > > > > >> > Thank you! > > > >> > > > > >> > ~Ed > > > >> > > > > >> > > > > >> > On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy < > dvrya...@gmail.com > > > > > > >> > wrote: > > > >> > > > > >> > > elephant-bird/lib/* (the * is important) > > > >> > > > > > >> > > On Wed, Sep 22, 2010 at 11:42 AM, pig <hadoopn...@gmail.com> > > wrote: > > > >> > > > > > >> > > > Well I thought that would be a simple enough fix but no luck > so > > > far. > > > >> > > > > > > >> > > > I've added the elephant-bird/lib directory (which I made world > > > >> readable > > > >> > > and > > > >> > > > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and > > > HADOOP_CLASSPATH > > > >> as > > > >> > > > both > > > >> > > > the user running grunt and the hadoop user. (sort of a shotgun > > > >> > approach) > > > >> > > > > > > >> > > > I still get the error where it complains about > nogplcompression > > > and > > > >> in > > > >> > > the > > > >> > > > log it has an error where it can't find > > > >> com.google.common.collect.Maps > > > >> > > > > > > >> > > > Are these two separate problems or is it one problem that is > > > causing > > > >> > two > > > >> > > > different errors? Thank you for the help! > > > >> > > > > > > >> > > > ~Ed > > > >> > > > > > > >> > > > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy < > > > dvrya...@gmail.com > > > >> > > > > >> > > > wrote: > > > >> > > > > > > >> > > > > You need the jars in elephant-bird's lib/ on your classpath > to > > > run > > > >> > > > > Elephant-Bird. > > > >> > > > > > > > >> > > > > > > > >> > > > > On Wed, Sep 22, 2010 at 10:35 AM, pig <hadoopn...@gmail.com > > > > > >> wrote: > > > >> > > > > > > > >> > > > > > Thank you for pointing out the 0.7 branch. I'm giving > the > > > 0.7 > > > >> > > branch > > > >> > > > a > > > >> > > > > > shot and have run into a problem when trying to run the > > > >> following > > > >> > > test > > > >> > > > > pig > > > >> > > > > > script: > > > >> > > > > > > > > >> > > > > > REGISTER elephant-bird-1.0.jar > > > >> > > > > > A = LOAD '/user/foo/input' USING > > > >> > > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t'); > > > >> > > > > > B = LIMIT A 100; > > > >> > > > > > DUMP B; > > > >> > > > > > > > > >> > > > > > When I try to run this I get the following error: > > > >> > > > > > > > > >> > > > > > java.lang.UnsatisfiedLinkError: no gplcompression in > > > >> > > java.library.path > > > >> > > > > > .... > > > >> > > > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load > > > >> native-lzo > > > >> > > > > without > > > >> > > > > > native-hadoop > > > >> > > > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: > > Unexpected > > > >> > > internal > > > >> > > > > > error. could not instantiate > > > >> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' > with > > > >> > arguments > > > >> > > > '[ > > > >> > > > > > ]' > > > >> > > > > > > > > >> > > > > > Looking at the log file it gives the following: > > > >> > > > > > > > > >> > > > > > java.lang.RuntimeException: could not instantiate > > > >> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' > with > > > >> > arguments > > > >> > > > '[ > > > >> > > > > > ]' > > > >> > > > > > ... > > > >> > > > > > Caused by: java.lang.reflect.InvocationTargetException > > > >> > > > > > ... > > > >> > > > > > Caused by: java.lang.NoClassDefFoundError: > > > >> > > > com/google/common/collect/Maps > > > >> > > > > > ... > > > >> > > > > > Caused by: java.lang.ClassNotFoundException: > > > >> > > > > com.google.common.collect.Maps > > > >> > > > > > > > > >> > > > > > What is confusing me is that LZO compression and > > decompression > > > >> > works > > > >> > > > fine > > > >> > > > > > when I'm running a normal java based map reduce program so > I > > > >> feel > > > >> > as > > > >> > > > > though > > > >> > > > > > the libraries have to be in the right place with the right > > > >> settings > > > >> > > for > > > >> > > > > > java.library.path. Otherwise how would normal java > > map-reduce > > > >> > work? > > > >> > > > Is > > > >> > > > > > there some other location I need to set JAVA_LIBRARY_PATH > > for > > > >> pig > > > >> > to > > > >> > > > pick > > > >> > > > > > it > > > >> > > > > > up? My understanding was that it would get this from > > > >> > hadoop-env.sh. > > > >> > > > Are > > > >> > > > > > the missing com.google.common.collect.Maps the real > problem > > > >> here? > > > >> > > > Thank > > > >> > > > > > you > > > >> > > > > > for any help! > > > >> > > > > > > > > >> > > > > > ~Ed > > > >> > > > > > > > > >> > > > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy < > > > >> > dvrya...@gmail.com> > > > >> > > > > > wrote: > > > >> > > > > > > > > >> > > > > > > Hi Ed, > > > >> > > > > > > Elephant-bird only works with 0.6 at the moment. There's > a > > > >> branch > > > >> > > for > > > >> > > > > 0.7 > > > >> > > > > > > that I haven't tested: > > > >> > http://github.com/hirohanin/elephant-bird/ > > > >> > > > > > > Try it, let me know if it works. > > > >> > > > > > > > > > >> > > > > > > -D > > > >> > > > > > > > > > >> > > > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig < > > hadoopn...@gmail.com> > > > >> > wrote: > > > >> > > > > > > > > > >> > > > > > > > Hello, > > > >> > > > > > > > > > > >> > > > > > > > I have a small cluster up and running with LZO > > compressed > > > >> files > > > >> > > in > > > >> > > > > it. > > > >> > > > > > > I'm > > > >> > > > > > > > using the lzo compression libraries available at > > > >> > > > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for > > > >> > > maintaining > > > >> > > > > > this!) > > > >> > > > > > > > > > > >> > > > > > > > So far everything works fine when I write regular > > > map-reduce > > > >> > > jobs. > > > >> > > > I > > > >> > > > > > can > > > >> > > > > > > > read in lzo files and write out lzo files without any > > > >> problem. > > > >> > > > > > > > > > > >> > > > > > > > I'm also using Pig 0.7 and it appears to be able to > read > > > LZO > > > >> > > files > > > >> > > > > out > > > >> > > > > > of > > > >> > > > > > > > the box using the default LoadFunc (PigStorage). > > However, > > > I > > > >> am > > > >> > > > > > currently > > > >> > > > > > > > testing a large LZO file (20GB) which I indexed using > > the > > > >> > > > LzoIndexer > > > >> > > > > > and > > > >> > > > > > > > Pig > > > >> > > > > > > > does not appear to be making use of the indexes. The > > pig > > > >> > scripts > > > >> > > > > that > > > >> > > > > > > I've > > > >> > > > > > > > run so far only have 3 mappers when processing the > 20GB > > > >> file. > > > >> > My > > > >> > > > > > > > understanding was that there should be 1 map for each > > > block > > > >> > > (256MB > > > >> > > > > > > blocks) > > > >> > > > > > > > so about 80 mappers when processing the 20GB lzo file. > > > Does > > > >> > Pig > > > >> > > > 0.7 > > > >> > > > > > > > support > > > >> > > > > > > > indexed lzo files with the default load function? > > > >> > > > > > > > > > > >> > > > > > > > If not, I was looking at elephant-bird and noticed it > is > > > >> only > > > >> > > > > > compatible > > > >> > > > > > > > with Pig 0.6 and not 0.7+ Is that accurate? What > would > > > be > > > >> the > > > >> > > > > > > recommended > > > >> > > > > > > > solution for processing index lzo files using Pig 0.7. > > > >> > > > > > > > > > > >> > > > > > > > Thank you for any assistance! > > > >> > > > > > > > > > > >> > > > > > > > ~Ed > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > >