Thank you for pointing out the 0.7 branch. I'm giving the 0.7 branch a shot and have run into a problem when trying to run the following test pig script:
REGISTER elephant-bird-1.0.jar A = LOAD '/user/foo/input' USING com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t'); B = LIMIT A 100; DUMP B; When I try to run this I get the following error: java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path .... ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo without native-hadoop ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected internal error. could not instantiate 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments '[ ]' Looking at the log file it gives the following: java.lang.RuntimeException: could not instantiate 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments '[ ]' ... Caused by: java.lang.reflect.InvocationTargetException ... Caused by: java.lang.NoClassDefFoundError: com/google/common/collect/Maps ... Caused by: java.lang.ClassNotFoundException: com.google.common.collect.Maps What is confusing me is that LZO compression and decompression works fine when I'm running a normal java based map reduce program so I feel as though the libraries have to be in the right place with the right settings for java.library.path. Otherwise how would normal java map-reduce work? Is there some other location I need to set JAVA_LIBRARY_PATH for pig to pick it up? My understanding was that it would get this from hadoop-env.sh. Are the missing com.google.common.collect.Maps the real problem here? Thank you for any help! ~Ed On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > Hi Ed, > Elephant-bird only works with 0.6 at the moment. There's a branch for 0.7 > that I haven't tested: http://github.com/hirohanin/elephant-bird/ > Try it, let me know if it works. > > -D > > On Tue, Sep 21, 2010 at 2:22 PM, pig <hadoopn...@gmail.com> wrote: > > > Hello, > > > > I have a small cluster up and running with LZO compressed files in it. > I'm > > using the lzo compression libraries available at > > http://github.com/kevinweil/hadoop-lzo (thank you for maintaining this!) > > > > So far everything works fine when I write regular map-reduce jobs. I can > > read in lzo files and write out lzo files without any problem. > > > > I'm also using Pig 0.7 and it appears to be able to read LZO files out of > > the box using the default LoadFunc (PigStorage). However, I am currently > > testing a large LZO file (20GB) which I indexed using the LzoIndexer and > > Pig > > does not appear to be making use of the indexes. The pig scripts that > I've > > run so far only have 3 mappers when processing the 20GB file. My > > understanding was that there should be 1 map for each block (256MB > blocks) > > so about 80 mappers when processing the 20GB lzo file. Does Pig 0.7 > > support > > indexed lzo files with the default load function? > > > > If not, I was looking at elephant-bird and noticed it is only compatible > > with Pig 0.6 and not 0.7+ Is that accurate? What would be the > recommended > > solution for processing index lzo files using Pig 0.7. > > > > Thank you for any assistance! > > > > ~Ed > > >