Ah, I didn't realize I need to put the jars on all the nodes since the error is being thrown before the pig script actually executes (it's throwing the error in the parsing stage). I assumed since the pig script hasn't executed yet it wasn't doing anything with the Hadoop nodes.
I will try adding PIG_CLASSPATH to my hadoop-env.sh and will then put the jar files on all the slave nodes. Hopefully that will solve the problem. ~Ed On Wed, Sep 22, 2010 at 3:28 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > try PIG_CLASSPATH > > Oh and you might need to explicitly register them.. sorry, forgot. We just > have them on the hadoop classpath on the nodes themselves, so we don't have > to do that, but you might if you are starting fresh. > > -D > > On Wed, Sep 22, 2010 at 12:01 PM, pig <hadoopn...@gmail.com> wrote: > > > [foo]$ echo $CLASSPATH > > :/usr/lib/elephant-bird/lib/* > > > > This has been set for both user foo and hadoop but I still get the same > > error. Is this the correct environment variable to be setting? > > > > Thank you! > > > > ~Ed > > > > > > On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dvrya...@gmail.com> > > wrote: > > > > > elephant-bird/lib/* (the * is important) > > > > > > On Wed, Sep 22, 2010 at 11:42 AM, pig <hadoopn...@gmail.com> wrote: > > > > > > > Well I thought that would be a simple enough fix but no luck so far. > > > > > > > > I've added the elephant-bird/lib directory (which I made world > readable > > > and > > > > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and HADOOP_CLASSPATH > as > > > > both > > > > the user running grunt and the hadoop user. (sort of a shotgun > > approach) > > > > > > > > I still get the error where it complains about nogplcompression and > in > > > the > > > > log it has an error where it can't find > com.google.common.collect.Maps > > > > > > > > Are these two separate problems or is it one problem that is causing > > two > > > > different errors? Thank you for the help! > > > > > > > > ~Ed > > > > > > > > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <dvrya...@gmail.com> > > > > wrote: > > > > > > > > > You need the jars in elephant-bird's lib/ on your classpath to run > > > > > Elephant-Bird. > > > > > > > > > > > > > > > On Wed, Sep 22, 2010 at 10:35 AM, pig <hadoopn...@gmail.com> > wrote: > > > > > > > > > > > Thank you for pointing out the 0.7 branch. I'm giving the 0.7 > > > branch > > > > a > > > > > > shot and have run into a problem when trying to run the following > > > test > > > > > pig > > > > > > script: > > > > > > > > > > > > REGISTER elephant-bird-1.0.jar > > > > > > A = LOAD '/user/foo/input' USING > > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t'); > > > > > > B = LIMIT A 100; > > > > > > DUMP B; > > > > > > > > > > > > When I try to run this I get the following error: > > > > > > > > > > > > java.lang.UnsatisfiedLinkError: no gplcompression in > > > java.library.path > > > > > > .... > > > > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load > native-lzo > > > > > without > > > > > > native-hadoop > > > > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected > > > internal > > > > > > error. could not instantiate > > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with > > arguments > > > > '[ > > > > > > ]' > > > > > > > > > > > > Looking at the log file it gives the following: > > > > > > > > > > > > java.lang.RuntimeException: could not instantiate > > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with > > arguments > > > > '[ > > > > > > ]' > > > > > > ... > > > > > > Caused by: java.lang.reflect.InvocationTargetException > > > > > > ... > > > > > > Caused by: java.lang.NoClassDefFoundError: > > > > com/google/common/collect/Maps > > > > > > ... > > > > > > Caused by: java.lang.ClassNotFoundException: > > > > > com.google.common.collect.Maps > > > > > > > > > > > > What is confusing me is that LZO compression and decompression > > works > > > > fine > > > > > > when I'm running a normal java based map reduce program so I feel > > as > > > > > though > > > > > > the libraries have to be in the right place with the right > settings > > > for > > > > > > java.library.path. Otherwise how would normal java map-reduce > > work? > > > > Is > > > > > > there some other location I need to set JAVA_LIBRARY_PATH for pig > > to > > > > pick > > > > > > it > > > > > > up? My understanding was that it would get this from > > hadoop-env.sh. > > > > Are > > > > > > the missing com.google.common.collect.Maps the real problem here? > > > > Thank > > > > > > you > > > > > > for any help! > > > > > > > > > > > > ~Ed > > > > > > > > > > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy < > > dvrya...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi Ed, > > > > > > > Elephant-bird only works with 0.6 at the moment. There's a > branch > > > for > > > > > 0.7 > > > > > > > that I haven't tested: > > http://github.com/hirohanin/elephant-bird/ > > > > > > > Try it, let me know if it works. > > > > > > > > > > > > > > -D > > > > > > > > > > > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <hadoopn...@gmail.com> > > wrote: > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > I have a small cluster up and running with LZO compressed > files > > > in > > > > > it. > > > > > > > I'm > > > > > > > > using the lzo compression libraries available at > > > > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for > > > maintaining > > > > > > this!) > > > > > > > > > > > > > > > > So far everything works fine when I write regular map-reduce > > > jobs. > > > > I > > > > > > can > > > > > > > > read in lzo files and write out lzo files without any > problem. > > > > > > > > > > > > > > > > I'm also using Pig 0.7 and it appears to be able to read LZO > > > files > > > > > out > > > > > > of > > > > > > > > the box using the default LoadFunc (PigStorage). However, I > am > > > > > > currently > > > > > > > > testing a large LZO file (20GB) which I indexed using the > > > > LzoIndexer > > > > > > and > > > > > > > > Pig > > > > > > > > does not appear to be making use of the indexes. The pig > > scripts > > > > > that > > > > > > > I've > > > > > > > > run so far only have 3 mappers when processing the 20GB file. > > My > > > > > > > > understanding was that there should be 1 map for each block > > > (256MB > > > > > > > blocks) > > > > > > > > so about 80 mappers when processing the 20GB lzo file. Does > > Pig > > > > 0.7 > > > > > > > > support > > > > > > > > indexed lzo files with the default load function? > > > > > > > > > > > > > > > > If not, I was looking at elephant-bird and noticed it is only > > > > > > compatible > > > > > > > > with Pig 0.6 and not 0.7+ Is that accurate? What would be > the > > > > > > > recommended > > > > > > > > solution for processing index lzo files using Pig 0.7. > > > > > > > > > > > > > > > > Thank you for any assistance! > > > > > > > > > > > > > > > > ~Ed > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >