Re: bz2 files with external tables, s3 usage, compressed tables?

Carl Steinbach Mon, 25 Jan 2010 10:59:01 -0800

Hi Adam,

Hive actually relies on the underlying Hadoop implementation for compression
support, i.e. whether or not Hive can support bz2 compressed files depends
on whether or not the Hadoop cluster the files are stored in supports the
bzip2 compression codec. Support for bzip2 was added in Hadoop 0.19, and it
looks like Amazon's EMR is running a variant of Hadoop 0.18.3, which
supports gzip but not bzip2.


There is a discussion of these issues on the Amazon EMR help forum here:
http://developer.amazonwebservices.com/connect/thread.jspa?messageID=145636

Thanks.

Carl

On Mon, Jan 25, 2010 at 10:38 AM, Adam J. O'Donnell <[email protected]>wrote:

> All:
>
> I have some questions regarding hive that I hope you can help me with.  I
> haven't had too much luck with the documentation on these, so any tips would
> be much appreciated.
>
> I initially posted these on the amazon elastic mapreduce message board,
> since some are S3 related, but I have gotten no love there.
>
>  - Can you create an external table that covers .bz2 files? For example, if
> I push a bunch of log files to a directory that are .bz2 compressed, can I
> directly select rows from the external table? If not, what is the best way
> of loading the .bz2 into a temporary Hive table such that I can do
> wildcarding?
>
> All of these files are in subdirectories, with the directory names serving
> as partition names.
>
>  - Is there some trick to storing compressed Hive tables that isn't clearly
> documented?  I tried the recipe in the Hive tutorial but didn't have much
> luck. Anyone here have any success?  This is using Hive 0.4.0 in amazon's
> cloud.
>
>  - Has anyone tried to compress the tables via .bz2 instead?  Is there an
> easy way of stream compressing it when using the s3 interfaces?
>
>  - What is more efficient: storing the tables in S3 as s3:// or s3n://?
>
> Thanks for your help!
>
> Adam
>
>

Re: bz2 files with external tables, s3 usage, compressed tables?

Reply via email to