Please excuse my ignorance, but can I import gzip compressed files directly
as Hive tables? I have separate gzip files for each days weblog data. Right
now I am gunzipping them and then importing into a raw table. Can I import
the gzipped files directly into Hive?

Saurabh.

On Wed, Jul 22, 2009 at 1:07 AM, Ashish Thusoo <[email protected]> wrote:

> I don't think these are splittable. Compression on sequencefiles is
> splittable across sequencefile blocks.
>
> Ashish
>
> -----Original Message-----
> From: Bill Craig [mailto:[email protected]]
> Sent: Tuesday, July 21, 2009 8:06 AM
> To: [email protected]
> Subject: bz2 Splits.
>
> I loaded 5 files of bzip2 compressed data into a table in Hive. Three are
> small test files containing 10,000 records. Two were large ~8Gb compressed.
> When I run a query against the table I see three tasks that complete almost
> immediately and two tasks that run for a very long time. It appears to me
> that Hive/Hadoop is not splitting the input of the *.bz2. I have seen some
> old mails about this, but could not find any resolution for this problem. I
> compressed the files using the Apache bz2 jar, the file are named *.bz2. I
> am using Hadoop
> 0.19.1 r745977
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Reply via email to