Please excuse my ignorance, but can I import gzip compressed files directly as Hive tables? I have separate gzip files for each days weblog data. Right now I am gunzipping them and then importing into a raw table. Can I import the gzipped files directly into Hive?
Saurabh. On Wed, Jul 22, 2009 at 1:07 AM, Ashish Thusoo <[email protected]> wrote: > I don't think these are splittable. Compression on sequencefiles is > splittable across sequencefile blocks. > > Ashish > > -----Original Message----- > From: Bill Craig [mailto:[email protected]] > Sent: Tuesday, July 21, 2009 8:06 AM > To: [email protected] > Subject: bz2 Splits. > > I loaded 5 files of bzip2 compressed data into a table in Hive. Three are > small test files containing 10,000 records. Two were large ~8Gb compressed. > When I run a query against the table I see three tasks that complete almost > immediately and two tasks that run for a very long time. It appears to me > that Hive/Hadoop is not splitting the input of the *.bz2. I have seen some > old mails about this, but could not find any resolution for this problem. I > compressed the files using the Apache bz2 jar, the file are named *.bz2. I > am using Hadoop > 0.19.1 r745977 > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
