Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

pig Wed, 22 Sep 2010 12:46:49 -0700

Ah,

I didn't realize I need to put the jars on all the nodes since the error is
being thrown before the pig script actually executes (it's throwing the
error in the parsing stage).  I assumed since the pig script hasn't executed
yet it wasn't doing anything with the Hadoop nodes.


I will try adding PIG_CLASSPATH to my hadoop-env.sh and will then put the
jar files on all the slave nodes.  Hopefully that will solve the problem.

~Ed

On Wed, Sep 22, 2010 at 3:28 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> try PIG_CLASSPATH
>
> Oh and you might need to explicitly register them.. sorry, forgot. We just
> have them on the hadoop classpath on the nodes themselves, so we don't have
> to do that, but you might if you are starting fresh.
>
> -D
>
> On Wed, Sep 22, 2010 at 12:01 PM, pig <hadoopn...@gmail.com> wrote:
>
> > [foo]$ echo $CLASSPATH
> > :/usr/lib/elephant-bird/lib/*
> >
> > This has been set for both user foo and hadoop but I still get the same
> > error.  Is this the correct environment variable to be setting?
> >
> > Thank you!
> >
> > ~Ed
> >
> >
> > On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
> > wrote:
> >
> > > elephant-bird/lib/* (the * is important)
> > >
> > > On Wed, Sep 22, 2010 at 11:42 AM, pig <hadoopn...@gmail.com> wrote:
> > >
> > > > Well I thought that would be a simple enough fix but no luck so far.
> > > >
> > > > I've added the elephant-bird/lib directory (which I made world
> readable
> > > and
> > > > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and HADOOP_CLASSPATH
> as
> > > > both
> > > > the user running grunt and the hadoop user. (sort of a shotgun
> > approach)
> > > >
> > > > I still get the error where it complains about nogplcompression and
> in
> > > the
> > > > log it has an error where it can't find
> com.google.common.collect.Maps
> > > >
> > > > Are these two separate problems or is it one problem that is causing
> > two
> > > > different errors?  Thank you for the help!
> > > >
> > > > ~Ed
> > > >
> > > > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
> > > > wrote:
> > > >
> > > > > You need the jars in elephant-bird's lib/ on your classpath to run
> > > > > Elephant-Bird.
> > > > >
> > > > >
> > > > > On Wed, Sep 22, 2010 at 10:35 AM, pig <hadoopn...@gmail.com>
> wrote:
> > > > >
> > > > > > Thank you for pointing out the 0.7 branch.   I'm giving the 0.7
> > > branch
> > > > a
> > > > > > shot and have run into a problem when trying to run the following
> > > test
> > > > > pig
> > > > > > script:
> > > > > >
> > > > > > REGISTER elephant-bird-1.0.jar
> > > > > > A = LOAD '/user/foo/input' USING
> > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
> > > > > > B = LIMIT A 100;
> > > > > > DUMP B;
> > > > > >
> > > > > > When I try to run this I get the following error:
> > > > > >
> > > > > > java.lang.UnsatisfiedLinkError: no gplcompression in
> > > java.library.path
> > > > > >  ....
> > > > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load
> native-lzo
> > > > > without
> > > > > > native-hadoop
> > > > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected
> > > internal
> > > > > > error.  could not instantiate
> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> > arguments
> > > > '[
> > > > > > ]'
> > > > > >
> > > > > > Looking at the log file it gives the following:
> > > > > >
> > > > > > java.lang.RuntimeException: could not instantiate
> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> > arguments
> > > > '[
> > > > > > ]'
> > > > > > ...
> > > > > > Caused by: java.lang.reflect.InvocationTargetException
> > > > > > ...
> > > > > > Caused by: java.lang.NoClassDefFoundError:
> > > > com/google/common/collect/Maps
> > > > > > ...
> > > > > > Caused by: java.lang.ClassNotFoundException:
> > > > > com.google.common.collect.Maps
> > > > > >
> > > > > > What is confusing me is that LZO compression and decompression
> > works
> > > > fine
> > > > > > when I'm running a normal java based map reduce program so I feel
> > as
> > > > > though
> > > > > > the libraries have to be in the right place with the right
> settings
> > > for
> > > > > > java.library.path.  Otherwise how would normal java map-reduce
> > work?
> > > >  Is
> > > > > > there some other location I need to set JAVA_LIBRARY_PATH for pig
> > to
> > > > pick
> > > > > > it
> > > > > > up?  My understanding was that it would get this from
> > hadoop-env.sh.
> > > >  Are
> > > > > > the missing com.google.common.collect.Maps the real problem here?
> > > >  Thank
> > > > > > you
> > > > > > for any help!
> > > > > >
> > > > > > ~Ed
> > > > > >
> > > > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <
> > dvrya...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ed,
> > > > > > > Elephant-bird only works with 0.6 at the moment. There's a
> branch
> > > for
> > > > > 0.7
> > > > > > > that I haven't tested:
> > http://github.com/hirohanin/elephant-bird/
> > > > > > > Try it, let me know if it works.
> > > > > > >
> > > > > > > -D
> > > > > > >
> > > > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <hadoopn...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > I have a small cluster up and running with LZO compressed
> files
> > > in
> > > > > it.
> > > > > > >  I'm
> > > > > > > > using the lzo compression libraries available at
> > > > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for
> > > maintaining
> > > > > > this!)
> > > > > > > >
> > > > > > > > So far everything works fine when I write regular map-reduce
> > > jobs.
> > > >  I
> > > > > > can
> > > > > > > > read in lzo files and write out lzo files without any
> problem.
> > > > > > > >
> > > > > > > > I'm also using Pig 0.7 and it appears to be able to read LZO
> > > files
> > > > > out
> > > > > > of
> > > > > > > > the box using the default LoadFunc (PigStorage).  However, I
> am
> > > > > > currently
> > > > > > > > testing a large LZO file (20GB) which I indexed using the
> > > > LzoIndexer
> > > > > > and
> > > > > > > > Pig
> > > > > > > > does not appear to be making use of the indexes.  The pig
> > scripts
> > > > > that
> > > > > > > I've
> > > > > > > > run so far only have 3 mappers when processing the 20GB file.
> >  My
> > > > > > > > understanding was that there should be 1 map for each block
> > > (256MB
> > > > > > > blocks)
> > > > > > > > so about 80 mappers when processing the 20GB lzo file.  Does
> > Pig
> > > > 0.7
> > > > > > > > support
> > > > > > > > indexed lzo files with the default load function?
> > > > > > > >
> > > > > > > > If not, I was looking at elephant-bird and noticed it is only
> > > > > > compatible
> > > > > > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would be
> the
> > > > > > > recommended
> > > > > > > > solution for processing index lzo files using Pig 0.7.
> > > > > > > >
> > > > > > > > Thank you for any assistance!
> > > > > > > >
> > > > > > > > ~Ed
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Reply via email to