bz2 files with external tables, s3 usage, compressed tables?

Adam J. O'Donnell Mon, 25 Jan 2010 10:39:13 -0800

All:

I have some questions regarding hive that I hope you can help me with.  I 
haven't had too much luck with the documentation on these, so any tips would be 
much appreciated.


I initially posted these on the amazon elastic mapreduce message board, since 
some are S3 related, but I have gotten no love there.

 - Can you create an external table that covers .bz2 files? For example, if I 
push a bunch of log files to a directory that are .bz2 compressed, can I 
directly select rows from the external table? If not, what is the best way of 
loading the .bz2 into a temporary Hive table such that I can do wildcarding?

All of these files are in subdirectories, with the directory names serving as 
partition names.

 - Is there some trick to storing compressed Hive tables that isn't clearly 
documented?  I tried the recipe in the Hive tutorial but didn't have much luck. 
Anyone here have any success?  This is using Hive 0.4.0 in amazon's cloud.

 - Has anyone tried to compress the tables via .bz2 instead?  Is there an easy 
way of stream compressing it when using the s3 interfaces?

 - What is more efficient: storing the tables in S3 as s3:// or s3n://?

Thanks for your help!

Adam

bz2 files with external tables, s3 usage, compressed tables?

Reply via email to