Thank you for pointing out the 0.7 branch.   I'm giving the 0.7 branch a
shot and have run into a problem when trying to run the following test pig
script:

REGISTER elephant-bird-1.0.jar
A = LOAD '/user/foo/input' USING
com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
B = LIMIT A 100;
DUMP B;

When I try to run this I get the following error:

java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
 ....
ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo without
native-hadoop
ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected internal
error.  could not instantiate
'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments '[
]'

Looking at the log file it gives the following:

java.lang.RuntimeException: could not instantiate
'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments '[
]'
...
Caused by: java.lang.reflect.InvocationTargetException
...
Caused by: java.lang.NoClassDefFoundError: com/google/common/collect/Maps
...
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.Maps

What is confusing me is that LZO compression and decompression works fine
when I'm running a normal java based map reduce program so I feel as though
the libraries have to be in the right place with the right settings for
java.library.path.  Otherwise how would normal java map-reduce work?  Is
there some other location I need to set JAVA_LIBRARY_PATH for pig to pick it
up?  My understanding was that it would get this from hadoop-env.sh.  Are
the missing com.google.common.collect.Maps the real problem here?  Thank you
for any help!

~Ed

On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> Hi Ed,
> Elephant-bird only works with 0.6 at the moment. There's a branch for 0.7
> that I haven't tested: http://github.com/hirohanin/elephant-bird/
> Try it, let me know if it works.
>
> -D
>
> On Tue, Sep 21, 2010 at 2:22 PM, pig <hadoopn...@gmail.com> wrote:
>
> > Hello,
> >
> > I have a small cluster up and running with LZO compressed files in it.
>  I'm
> > using the lzo compression libraries available at
> > http://github.com/kevinweil/hadoop-lzo (thank you for maintaining this!)
> >
> > So far everything works fine when I write regular map-reduce jobs.  I can
> > read in lzo files and write out lzo files without any problem.
> >
> > I'm also using Pig 0.7 and it appears to be able to read LZO files out of
> > the box using the default LoadFunc (PigStorage).  However, I am currently
> > testing a large LZO file (20GB) which I indexed using the LzoIndexer and
> > Pig
> > does not appear to be making use of the indexes.  The pig scripts that
> I've
> > run so far only have 3 mappers when processing the 20GB file.  My
> > understanding was that there should be 1 map for each block (256MB
> blocks)
> > so about 80 mappers when processing the 20GB lzo file.  Does Pig 0.7
> > support
> > indexed lzo files with the default load function?
> >
> > If not, I was looking at elephant-bird and noticed it is only compatible
> > with Pig 0.6 and not 0.7+  Is that accurate?  What would be the
> recommended
> > solution for processing index lzo files using Pig 0.7.
> >
> > Thank you for any assistance!
> >
> > ~Ed
> >
>

Reply via email to