Thank you for pointing out the 0.7 branch.   I'm giving the 0.7 branch a
shot and have run into a problem when trying to run the following test pig

REGISTER elephant-bird-1.0.jar
A = LOAD '/user/foo/input' USING
B = LIMIT A 100;

When I try to run this I get the following error:

java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo without
ERROR - EROR 2999: Unexpected internal
error.  could not instantiate
'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments '[

Looking at the log file it gives the following:

java.lang.RuntimeException: could not instantiate
'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments '[
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.lang.NoClassDefFoundError: com/google/common/collect/Maps
Caused by: java.lang.ClassNotFoundException:

What is confusing me is that LZO compression and decompression works fine
when I'm running a normal java based map reduce program so I feel as though
the libraries have to be in the right place with the right settings for
java.library.path.  Otherwise how would normal java map-reduce work?  Is
there some other location I need to set JAVA_LIBRARY_PATH for pig to pick it
up?  My understanding was that it would get this from  Are
the missing the real problem here?  Thank you
for any help!


On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <> wrote:

> Hi Ed,
> Elephant-bird only works with 0.6 at the moment. There's a branch for 0.7
> that I haven't tested:
> Try it, let me know if it works.
> -D
> On Tue, Sep 21, 2010 at 2:22 PM, pig <> wrote:
> > Hello,
> >
> > I have a small cluster up and running with LZO compressed files in it.
>  I'm
> > using the lzo compression libraries available at
> > (thank you for maintaining this!)
> >
> > So far everything works fine when I write regular map-reduce jobs.  I can
> > read in lzo files and write out lzo files without any problem.
> >
> > I'm also using Pig 0.7 and it appears to be able to read LZO files out of
> > the box using the default LoadFunc (PigStorage).  However, I am currently
> > testing a large LZO file (20GB) which I indexed using the LzoIndexer and
> > Pig
> > does not appear to be making use of the indexes.  The pig scripts that
> I've
> > run so far only have 3 mappers when processing the 20GB file.  My
> > understanding was that there should be 1 map for each block (256MB
> blocks)
> > so about 80 mappers when processing the 20GB lzo file.  Does Pig 0.7
> > support
> > indexed lzo files with the default load function?
> >
> > If not, I was looking at elephant-bird and noticed it is only compatible
> > with Pig 0.6 and not 0.7+  Is that accurate?  What would be the
> recommended
> > solution for processing index lzo files using Pig 0.7.
> >
> > Thank you for any assistance!
> >
> > ~Ed
> >

Reply via email to