On Tue, Jul 28, 2009 at 11:02 AM, Edward Capriolo<[email protected]> wrote: > On Tue, Jul 28, 2009 at 2:22 AM, Zheng Shao<[email protected]> wrote: >> Yes we do compress all tables. >> >> Zheng >> >> On Mon, Jul 27, 2009 at 11:08 PM, Saurabh Nanda<[email protected]> >> wrote: >>> >>>> In our setup, we didn't change io.seqfile.compress.blocksize (1MB) and >>>> it's still fairly good. >>>> You are free to try 100MB for better compression ratio, but I would >>>> recommend to keep the default setting to minimize the possibilities of >>>> hitting unknown bugs. >>> >>> Makes sense. Better compression brought down a count(1) query from 100+ sec >>> down to 40sec. The ETL phase is now taking 510sec as opposed to 700sec >>> earlier. >>> >>> Do you also compress all tables, not just the raw ones? Would you recommend >>> it? >>> >>> Saurabh. >>> -- >>> http://nandz.blogspot.com >>> http://foodieforlife.blogspot.com >>> >> >> >> >> -- >> Yours, >> Zheng >> > > Saurabh, > > That you for the wiki page on this. Keep up the good work and please > post all your findings about compression. Many people (including me) > will benefit from an explanation about the different types of > compression available and the trade offs of different codecs and > options. I am really excited as I have (shamefully ) had some large > tables with multiple text files building up, and the thought of > smaller data and faster queries is giving me goosebumps. > > Edward >
On a related note.. Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: SequenceFile doesn't work with GzipCodec without native-hadoop code! :( I have an 18.3 (cloudera) system in production. hadoop-native-0.18.3-7.cloudera.CH0_3.i386.rpm Is there any java based codec I could use that does not require external native libraries?
