pig-user  

Re: bzip2/LZO Compressed input

Johannes Rußek
Wed, 17 Mar 2010 09:21:30 -0700

Hello Dmitriy!
Sure can do,
would love to give it a test run though :)
No hurry though. Thanks and regards,
Johannes


Am 17.03.2010 16:34, schrieb Dmitriy Ryaboy:
Johannes,
If you can wait a week or two, we (Twitter) are about to open-source all of
our LZO+Protobuf+Pig stuff. Just documentation left to do :-).

-Dmitriy

On Wed, Mar 17, 2010 at 8:14 AM, Johannes Rußek<
johannes.rus...@io-consulting.net>  wrote:

Hello everybody,
I'm trying to use pig with compressed input files.
I have a bunch of 1-2GB big apache log files which are compressed down to
30-40MB by using bzip2.
I tried to simply load the .bz2 file, but it only "kind of" worked. It
seems that it only loaded a fraction of the file and processed that.
When I took the uncompressed file, i ended up with ~3500 lines of output,
but when i used the .bz2 input file, i had ten.
Does this make any sense to you?
I've also tried using .lzo files, but pig wouldn't read them in at all, so
i figure i have to install some LZO Classes for that.
Any hints where I can find them and how to integrate them?
Thanks and best regards,
Johannes