Re: bz2 files with external tables, s3 usage, compressed tables?

Zheng Shao Mon, 25 Jan 2010 17:55:35 -0800

We just can put it in hive-site.xml.

The block storage is pretty efficient since the compression codec is
native. It should be close to what you get with the command line tools
like "bzip2" and "gzip".


Zheng

On Mon, Jan 25, 2010 at 11:40 AM, Adam J. O'Donnell <[email protected]> wrote:
> Damn thanks for the tip Carl.  I forgot the version of hadoop amazon is 
> running is a little old.
>
> Any ideas on getting table compression on output to work?  For example, do I 
> have to specify exec.compress.output on every run, or should i put that into 
> my hive-site.xml?  I assume that isn't stored in the metadata store, is it?  
> Also, how efficient is the block storage?  Is there a knob I can adjust on 
> that?
>
> Thanks!
>
> On Jan 25, 2010, at 10:58 AM, Carl Steinbach wrote:
>
>> Hi Adam,
>>
>> Hive actually relies on the underlying Hadoop implementation for compression 
>> support, i.e. whether or not Hive can support bz2 compressed files depends 
>> on whether or not the Hadoop cluster the files are stored in supports the 
>> bzip2 compression codec. Support for bzip2 was added in Hadoop 0.19, and it 
>> looks like Amazon's EMR is running a variant of Hadoop 0.18.3, which 
>> supports gzip but not bzip2.
>>
>> There is a discussion of these issues on the Amazon EMR help forum here:
>> http://developer.amazonwebservices.com/connect/thread.jspa?messageID=145636
>>
>> Thanks.
>>
>> Carl
>>
>> On Mon, Jan 25, 2010 at 10:38 AM, Adam J. O'Donnell <[email protected]> wrote:
>> All:
>>
>> I have some questions regarding hive that I hope you can help me with.  I 
>> haven't had too much luck with the documentation on these, so any tips would 
>> be much appreciated.
>>
>> I initially posted these on the amazon elastic mapreduce message board, 
>> since some are S3 related, but I have gotten no love there.
>>
>>  - Can you create an external table that covers .bz2 files? For example, if 
>> I push a bunch of log files to a directory that are .bz2 compressed, can I 
>> directly select rows from the external table? If not, what is the best way 
>> of loading the .bz2 into a temporary Hive table such that I can do 
>> wildcarding?
>>
>> All of these files are in subdirectories, with the directory names serving 
>> as partition names.
>>
>>  - Is there some trick to storing compressed Hive tables that isn't clearly 
>> documented?  I tried the recipe in the Hive tutorial but didn't have much 
>> luck. Anyone here have any success?  This is using Hive 0.4.0 in amazon's 
>> cloud.
>>
>>  - Has anyone tried to compress the tables via .bz2 instead?  Is there an 
>> easy way of stream compressing it when using the s3 interfaces?
>>
>>  - What is more efficient: storing the tables in S3 as s3:// or s3n://?
>>
>> Thanks for your help!
>>
>> Adam
>>
>>
>
> --
> Adam J. O'Donnell, Ph.D.
> Immunet Corporation
> Cell: +1 (267) 251-0070
>
>



-- 
Yours,
Zheng

Re: bz2 files with external tables, s3 usage, compressed tables?

Reply via email to