Re: Re: bz2 Splits.

Edward Capriolo Tue, 28 Jul 2009 09:39:45 -0700

On Tue, Jul 28, 2009 at 11:02 AM, Edward Capriolo<[email protected]> wrote:
> On Tue, Jul 28, 2009 at 2:22 AM, Zheng Shao<[email protected]> wrote:
>> Yes we do compress all tables.
>>
>> Zheng
>>
>> On Mon, Jul 27, 2009 at 11:08 PM, Saurabh Nanda<[email protected]> 
>> wrote:
>>>
>>>> In our setup, we didn't change io.seqfile.compress.blocksize (1MB) and
>>>> it's still fairly good.
>>>> You are free to try 100MB for better compression ratio, but I would
>>>> recommend to keep the default setting to minimize the possibilities of
>>>> hitting unknown bugs.
>>>
>>> Makes sense. Better compression brought down a count(1) query from 100+ sec
>>> down to 40sec. The ETL phase is now taking 510sec as opposed to 700sec
>>> earlier.
>>>
>>> Do you also compress all tables, not just the raw ones? Would you recommend
>>> it?
>>>
>>> Saurabh.
>>> --
>>> http://nandz.blogspot.com
>>> http://foodieforlife.blogspot.com
>>>
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
> Saurabh,
>
> That you for the wiki page on this. Keep up the good work and please
> post all your findings about compression. Many people (including me)
> will benefit  from an explanation about the different types of
> compression available and the trade offs of different codecs and
> options. I am really excited as I have (shamefully ) had some large
> tables with multiple text files building up, and the thought of
> smaller data and faster queries is giving me goosebumps.
>
> Edward
>


On a related note..
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.IllegalArgumentException: SequenceFile doesn't work with
GzipCodec without native-hadoop code!
:(
I have an 18.3 (cloudera) system in production.
hadoop-native-0.18.3-7.cloudera.CH0_3.i386.rpm
Is there any java based codec I could use that does not require
external native libraries?

Re: Re: bz2 Splits.

Reply via email to