Hi Ed, Elephant-bird only works with 0.6 at the moment. There's a branch for 0.7 that I haven't tested: http://github.com/hirohanin/elephant-bird/ Try it, let me know if it works.
-D On Tue, Sep 21, 2010 at 2:22 PM, pig <hadoopn...@gmail.com> wrote: > Hello, > > I have a small cluster up and running with LZO compressed files in it. I'm > using the lzo compression libraries available at > http://github.com/kevinweil/hadoop-lzo (thank you for maintaining this!) > > So far everything works fine when I write regular map-reduce jobs. I can > read in lzo files and write out lzo files without any problem. > > I'm also using Pig 0.7 and it appears to be able to read LZO files out of > the box using the default LoadFunc (PigStorage). However, I am currently > testing a large LZO file (20GB) which I indexed using the LzoIndexer and > Pig > does not appear to be making use of the indexes. The pig scripts that I've > run so far only have 3 mappers when processing the 20GB file. My > understanding was that there should be 1 map for each block (256MB blocks) > so about 80 mappers when processing the 20GB lzo file. Does Pig 0.7 > support > indexed lzo files with the default load function? > > If not, I was looking at elephant-bird and noticed it is only compatible > with Pig 0.6 and not 0.7+ Is that accurate? What would be the recommended > solution for processing index lzo files using Pig 0.7. > > Thank you for any assistance! > > ~Ed >