Re: bz2 files with external tables, s3 usage, compressed tables?

Adam J. O'Donnell Mon, 25 Jan 2010 11:41:36 -0800

Damn thanks for the tip Carl.  I forgot the version of hadoop amazon is running 
is a little old.


Any ideas on getting table compression on output to work?  For example, do I 
have to specify exec.compress.output on every run, or should i put that into my 
hive-site.xml?  I assume that isn't stored in the metadata store, is it?  Also, 
how efficient is the block storage?  Is there a knob I can adjust on that?

Thanks!

On Jan 25, 2010, at 10:58 AM, Carl Steinbach wrote:

> Hi Adam,
> 
> Hive actually relies on the underlying Hadoop implementation for compression 
> support, i.e. whether or not Hive can support bz2 compressed files depends on 
> whether or not the Hadoop cluster the files are stored in supports the bzip2 
> compression codec. Support for bzip2 was added in Hadoop 0.19, and it looks 
> like Amazon's EMR is running a variant of Hadoop 0.18.3, which supports gzip 
> but not bzip2.
> 
> There is a discussion of these issues on the Amazon EMR help forum here:
> http://developer.amazonwebservices.com/connect/thread.jspa?messageID=145636 
> 
> Thanks.
> 
> Carl
> 
> On Mon, Jan 25, 2010 at 10:38 AM, Adam J. O'Donnell <[email protected]> wrote:
> All:
> 
> I have some questions regarding hive that I hope you can help me with.  I 
> haven't had too much luck with the documentation on these, so any tips would 
> be much appreciated.
> 
> I initially posted these on the amazon elastic mapreduce message board, since 
> some are S3 related, but I have gotten no love there.
> 
>  - Can you create an external table that covers .bz2 files? For example, if I 
> push a bunch of log files to a directory that are .bz2 compressed, can I 
> directly select rows from the external table? If not, what is the best way of 
> loading the .bz2 into a temporary Hive table such that I can do wildcarding?
> 
> All of these files are in subdirectories, with the directory names serving as 
> partition names.
> 
>  - Is there some trick to storing compressed Hive tables that isn't clearly 
> documented?  I tried the recipe in the Hive tutorial but didn't have much 
> luck. Anyone here have any success?  This is using Hive 0.4.0 in amazon's 
> cloud.
> 
>  - Has anyone tried to compress the tables via .bz2 instead?  Is there an 
> easy way of stream compressing it when using the s3 interfaces?
> 
>  - What is more efficient: storing the tables in S3 as s3:// or s3n://?
> 
> Thanks for your help!
> 
> Adam
> 
> 

--
Adam J. O'Donnell, Ph.D.
Immunet Corporation
Cell: +1 (267) 251-0070

Re: bz2 files with external tables, s3 usage, compressed tables?

Reply via email to