pig-user  

Re: bzip2/LZO Compressed input

Dmitriy Ryaboy
Wed, 17 Mar 2010 08:35:19 -0700

Johannes,
If you can wait a week or two, we (Twitter) are about to open-source all of
our LZO+Protobuf+Pig stuff. Just documentation left to do :-).

-Dmitriy

On Wed, Mar 17, 2010 at 8:14 AM, Johannes Rußek <
johannes.rus...@io-consulting.net> wrote:

> Hello everybody,
> I'm trying to use pig with compressed input files.
> I have a bunch of 1-2GB big apache log files which are compressed down to
> 30-40MB by using bzip2.
> I tried to simply load the .bz2 file, but it only "kind of" worked. It
> seems that it only loaded a fraction of the file and processed that.
> When I took the uncompressed file, i ended up with ~3500 lines of output,
> but when i used the .bz2 input file, i had ten.
> Does this make any sense to you?
> I've also tried using .lzo files, but pig wouldn't read them in at all, so
> i figure i have to install some LZO Classes for that.
> Any hints where I can find them and how to integrate them?
> Thanks and best regards,
> Johannes
>