Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Gerrit van Vuuren Wed, 22 Sep 2010 15:37:04 -0700

Hi,

You also need to add the java.library.path to pig opts in following to 
$PIG_HOME/bin/pig


E.g:
PIG_OPTS="$PIG_OPTS -Djava.library.path=/opt/hadoop/lib/native/Linux-amd64"

,

cheers.



----- Original Message -----
From: pig <hadoopn...@gmail.com>
To: pig-user@hadoop.apache.org <pig-user@hadoop.apache.org>
Sent: Wed Sep 22 23:25:58 2010
Subject: Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig 
work with 0.7?

Hi Dimitry,

Using the REGISTER pig keyword got rid of the missing class error.  Thank
you!

I still have the error regarding the lzo codec missing.

I followed all the steps outlined by Gerrit and LZO works without any
problems when I'm using it in java based map-reduce programs (including
outputting compressed lzo files).  However, for some reason I still have the
problem with Pig.  I added the hadoop-kevinweil-gpl-compression.jar to my
$PIG_HOME/lib directory on all nodes and machine I'm running pig from.  THe
native libraries are also in the correct location in the
hadoop/lib/native/Linux-amd64 folder  (libgplcompression.so and
libhadoop.so.1.0.0)

I'm assuming that pig will pick up the JAVA_LIBRARY_PATH variable set in
hadoop-env.sh.  Is that correct?  Thank you!

~Ed

On Wed, Sep 22, 2010 at 5:44 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> By register I mean the pig register keyword.
>
> So, in addition to
>
> REGISTER elephant-bird-1.0.jar
>
> you should also
>
> REGISTER /usr/lib/elephant-pig/lib/google-collections-1.0.jar
>
> and possibly the rest of the jars in that directory. Might be simpler to
> jar
> them up together and just register a single jar.
>
>
> -D
>
> On Wed, Sep 22, 2010 at 1:47 PM, pig <hadoopn...@gmail.com> wrote:
>
> > I added the jars to all my nodes in /usr/lib/elephant-pig/lib
> >
> > I then modified hadoop-env.sh for all nodes so that it includes the entry
> >
> >     export PIG_CLASSPATH=/usr/lib/elephant-pig/lib/*:$PIG_CLASSPATH
> >
> > I start up the grunt shell and first past the line:
> >
> >     REGISTER elephant-bird-1.0.jar
> >
> > This has no problems.  Then I add the line:
> >
> >     A = LOAD '/user/foo/input' USING
> > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('|');
> >
> > At this point the following error prints to screen:
> >
> > --------------------
> > [main] ERROR com.hadoop.compression.lzo.GPLNativeCodeLoader - Could not
> > load
> > native gpl library
> > java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
> > ...
> > [main] ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo
> > without native-hadoop
> > --------------------
> >
> > No log entry is generated and the grunt shell continues to work.  (LZO
> > works
> > fine with when I run java based map-reduce programs). I then add the
> final
> > 2
> > lines of the pig script:
> >
> >     B=LIMIT A 100;
> >     DUMP B;
> >
> > The program starts to execute and fails.  The nodes running the mapper
> give
> > the error java.lang.ClassNotFoundException:
> com.google.common.collect.Maps
> > and fails.  (This was the same error I was getting before in my pig log
> > files).  The class not found exception no longer shows up in my pig log
> > file.  In its place is a more generic RunTimeException.
> >
> > On all nodes I also tried
> >
> >     export PIG_CLASSPATH=/usr/lib/elephant-pig/lib:$PIG_CLASSPATH
> >
> > (without the *)
> >
> > and I also tried modifying JAVA_LIBRARY_PATH to include the location of
> the
> > elephant-pig jar files.
> >
> > I'm using the cloudera distro of Hadoop 0.20.2 if that might someone be
> > causing problems.  When you said I might need to "register" the jar files
> > was does that mean exactly?  Thanks again for all your assistance and
> > prompt
> > responses.
> >
> > ~Ed
> >
> > On Wed, Sep 22, 2010 at 3:46 PM, pig <hadoopn...@gmail.com> wrote:
> >
> > > Ah,
> > >
> > > I didn't realize I need to put the jars on all the nodes since the
> error
> > is
> > > being thrown before the pig script actually executes (it's throwing the
> > > error in the parsing stage).  I assumed since the pig script hasn't
> > executed
> > > yet it wasn't doing anything with the Hadoop nodes.
> > >
> > > I will try adding PIG_CLASSPATH to my hadoop-env.sh and will then put
> the
> > > jar files on all the slave nodes.  Hopefully that will solve the
> problem.
> > >
> > > ~Ed
> > >
> > >
> > > On Wed, Sep 22, 2010 at 3:28 PM, Dmitriy Ryaboy <dvrya...@gmail.com
> > >wrote:
> > >
> > >> try PIG_CLASSPATH
> > >>
> > >> Oh and you might need to explicitly register them.. sorry, forgot. We
> > just
> > >> have them on the hadoop classpath on the nodes themselves, so we don't
> > >> have
> > >> to do that, but you might if you are starting fresh.
> > >>
> > >> -D
> > >>
> > >> On Wed, Sep 22, 2010 at 12:01 PM, pig <hadoopn...@gmail.com> wrote:
> > >>
> > >> > [foo]$ echo $CLASSPATH
> > >> > :/usr/lib/elephant-bird/lib/*
> > >> >
> > >> > This has been set for both user foo and hadoop but I still get the
> > same
> > >> > error.  Is this the correct environment variable to be setting?
> > >> >
> > >> > Thank you!
> > >> >
> > >> > ~Ed
> > >> >
> > >> >
> > >> > On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dvrya...@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > elephant-bird/lib/* (the * is important)
> > >> > >
> > >> > > On Wed, Sep 22, 2010 at 11:42 AM, pig <hadoopn...@gmail.com>
> wrote:
> > >> > >
> > >> > > > Well I thought that would be a simple enough fix but no luck so
> > far.
> > >> > > >
> > >> > > > I've added the elephant-bird/lib directory (which I made world
> > >> readable
> > >> > > and
> > >> > > > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and
> > HADOOP_CLASSPATH
> > >> as
> > >> > > > both
> > >> > > > the user running grunt and the hadoop user. (sort of a shotgun
> > >> > approach)
> > >> > > >
> > >> > > > I still get the error where it complains about nogplcompression
> > and
> > >> in
> > >> > > the
> > >> > > > log it has an error where it can't find
> > >> com.google.common.collect.Maps
> > >> > > >
> > >> > > > Are these two separate problems or is it one problem that is
> > causing
> > >> > two
> > >> > > > different errors?  Thank you for the help!
> > >> > > >
> > >> > > > ~Ed
> > >> > > >
> > >> > > > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <
> > dvrya...@gmail.com
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > You need the jars in elephant-bird's lib/ on your classpath to
> > run
> > >> > > > > Elephant-Bird.
> > >> > > > >
> > >> > > > >
> > >> > > > > On Wed, Sep 22, 2010 at 10:35 AM, pig <hadoopn...@gmail.com>
> > >> wrote:
> > >> > > > >
> > >> > > > > > Thank you for pointing out the 0.7 branch.   I'm giving the
> > 0.7
> > >> > > branch
> > >> > > > a
> > >> > > > > > shot and have run into a problem when trying to run the
> > >> following
> > >> > > test
> > >> > > > > pig
> > >> > > > > > script:
> > >> > > > > >
> > >> > > > > > REGISTER elephant-bird-1.0.jar
> > >> > > > > > A = LOAD '/user/foo/input' USING
> > >> > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
> > >> > > > > > B = LIMIT A 100;
> > >> > > > > > DUMP B;
> > >> > > > > >
> > >> > > > > > When I try to run this I get the following error:
> > >> > > > > >
> > >> > > > > > java.lang.UnsatisfiedLinkError: no gplcompression in
> > >> > > java.library.path
> > >> > > > > >  ....
> > >> > > > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load
> > >> native-lzo
> > >> > > > > without
> > >> > > > > > native-hadoop
> > >> > > > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999:
> Unexpected
> > >> > > internal
> > >> > > > > > error.  could not instantiate
> > >> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> > >> > arguments
> > >> > > > '[
> > >> > > > > > ]'
> > >> > > > > >
> > >> > > > > > Looking at the log file it gives the following:
> > >> > > > > >
> > >> > > > > > java.lang.RuntimeException: could not instantiate
> > >> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> > >> > arguments
> > >> > > > '[
> > >> > > > > > ]'
> > >> > > > > > ...
> > >> > > > > > Caused by: java.lang.reflect.InvocationTargetException
> > >> > > > > > ...
> > >> > > > > > Caused by: java.lang.NoClassDefFoundError:
> > >> > > > com/google/common/collect/Maps
> > >> > > > > > ...
> > >> > > > > > Caused by: java.lang.ClassNotFoundException:
> > >> > > > > com.google.common.collect.Maps
> > >> > > > > >
> > >> > > > > > What is confusing me is that LZO compression and
> decompression
> > >> > works
> > >> > > > fine
> > >> > > > > > when I'm running a normal java based map reduce program so I
> > >> feel
> > >> > as
> > >> > > > > though
> > >> > > > > > the libraries have to be in the right place with the right
> > >> settings
> > >> > > for
> > >> > > > > > java.library.path.  Otherwise how would normal java
> map-reduce
> > >> > work?
> > >> > > >  Is
> > >> > > > > > there some other location I need to set JAVA_LIBRARY_PATH
> for
> > >> pig
> > >> > to
> > >> > > > pick
> > >> > > > > > it
> > >> > > > > > up?  My understanding was that it would get this from
> > >> > hadoop-env.sh.
> > >> > > >  Are
> > >> > > > > > the missing com.google.common.collect.Maps the real problem
> > >> here?
> > >> > > >  Thank
> > >> > > > > > you
> > >> > > > > > for any help!
> > >> > > > > >
> > >> > > > > > ~Ed
> > >> > > > > >
> > >> > > > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <
> > >> > dvrya...@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi Ed,
> > >> > > > > > > Elephant-bird only works with 0.6 at the moment. There's a
> > >> branch
> > >> > > for
> > >> > > > > 0.7
> > >> > > > > > > that I haven't tested:
> > >> > http://github.com/hirohanin/elephant-bird/
> > >> > > > > > > Try it, let me know if it works.
> > >> > > > > > >
> > >> > > > > > > -D
> > >> > > > > > >
> > >> > > > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <
> hadoopn...@gmail.com>
> > >> > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hello,
> > >> > > > > > > >
> > >> > > > > > > > I have a small cluster up and running with LZO
> compressed
> > >> files
> > >> > > in
> > >> > > > > it.
> > >> > > > > > >  I'm
> > >> > > > > > > > using the lzo compression libraries available at
> > >> > > > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for
> > >> > > maintaining
> > >> > > > > > this!)
> > >> > > > > > > >
> > >> > > > > > > > So far everything works fine when I write regular
> > map-reduce
> > >> > > jobs.
> > >> > > >  I
> > >> > > > > > can
> > >> > > > > > > > read in lzo files and write out lzo files without any
> > >> problem.
> > >> > > > > > > >
> > >> > > > > > > > I'm also using Pig 0.7 and it appears to be able to read
> > LZO
> > >> > > files
> > >> > > > > out
> > >> > > > > > of
> > >> > > > > > > > the box using the default LoadFunc (PigStorage).
>  However,
> > I
> > >> am
> > >> > > > > > currently
> > >> > > > > > > > testing a large LZO file (20GB) which I indexed using
> the
> > >> > > > LzoIndexer
> > >> > > > > > and
> > >> > > > > > > > Pig
> > >> > > > > > > > does not appear to be making use of the indexes.  The
> pig
> > >> > scripts
> > >> > > > > that
> > >> > > > > > > I've
> > >> > > > > > > > run so far only have 3 mappers when processing the 20GB
> > >> file.
> > >> >  My
> > >> > > > > > > > understanding was that there should be 1 map for each
> > block
> > >> > > (256MB
> > >> > > > > > > blocks)
> > >> > > > > > > > so about 80 mappers when processing the 20GB lzo file.
> >  Does
> > >> > Pig
> > >> > > > 0.7
> > >> > > > > > > > support
> > >> > > > > > > > indexed lzo files with the default load function?
> > >> > > > > > > >
> > >> > > > > > > > If not, I was looking at elephant-bird and noticed it is
> > >> only
> > >> > > > > > compatible
> > >> > > > > > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would
> > be
> > >> the
> > >> > > > > > > recommended
> > >> > > > > > > > solution for processing index lzo files using Pig 0.7.
> > >> > > > > > > >
> > >> > > > > > > > Thank you for any assistance!
> > >> > > > > > > >
> > >> > > > > > > > ~Ed
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Reply via email to