pig-user  

bzip2/LZO Compressed input

Johannes Rußek
Wed, 17 Mar 2010 08:15:02 -0700

Hello everybody,
I'm trying to use pig with compressed input files.
I have a bunch of 1-2GB big apache log files which are compressed down to 30-40MB by using bzip2. I tried to simply load the .bz2 file, but it only "kind of" worked. It seems that it only loaded a fraction of the file and processed that. When I took the uncompressed file, i ended up with ~3500 lines of output, but when i used the .bz2 input file, i had ten.
Does this make any sense to you?
I've also tried using .lzo files, but pig wouldn't read them in at all, so i figure i have to install some LZO Classes for that.
Any hints where I can find them and how to integrate them?
Thanks and best regards,
Johannes