Dmitriy Ryaboy
Wed, 17 Mar 2010 08:35:19 -0700
Johannes, If you can wait a week or two, we (Twitter) are about to open-source all of our LZO+Protobuf+Pig stuff. Just documentation left to do :-).
-Dmitriy On Wed, Mar 17, 2010 at 8:14 AM, Johannes Rußek < johannes.rus...@io-consulting.net> wrote: > Hello everybody, > I'm trying to use pig with compressed input files. > I have a bunch of 1-2GB big apache log files which are compressed down to > 30-40MB by using bzip2. > I tried to simply load the .bz2 file, but it only "kind of" worked. It > seems that it only loaded a fraction of the file and processed that. > When I took the uncompressed file, i ended up with ~3500 lines of output, > but when i used the .bz2 input file, i had ten. > Does this make any sense to you? > I've also tried using .lzo files, but pig wouldn't read them in at all, so > i figure i have to install some LZO Classes for that. > Any hints where I can find them and how to integrate them? > Thanks and best regards, > Johannes >