Hi Saurabh, If you want to load data (in compressed/uncompressed text format) into a table, you have to defined the table as "stored as textfile" instead of "stored as sequencefile".
Can you try again and let us know? Zheng On Sat, Jul 25, 2009 at 3:05 AM, Saurabh Nanda<[email protected]> wrote: > I tried the following and ran into an error message: > > create table compressed_raw(line string) partitioned by(dt string) > row format delimited fields terminated by '\t' lines terminated by '\n' > stored as sequencefile; > > hive> load data local inpath > '/tmp/weblogs/20090602000000-172.16.1.40-access.log.gz' into table > compressed_raw partition(dt='2009-06-01'); > Copying data from file:/tmp/weblogs/20090602000000-172.16.1.40-access.log.gz > Loading data to table compressed_raw partition {dt=2009-06-01} > Failed with exception Cannot load text files into a table stored as > SequenceFile. > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask > > I guess this is what the following thread is talking about -- > http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/200903.mbox/%[email protected]%3e > > To sum up the discussion there, do I have to first import into a textfile > table, set hive.exec.compress.output to true, and then insert into a > sequencefile table? If that's the case, I don't understand why I have to > explicitly set hive.exec.compress.output? Shouldn't the fact that the target > is a sequencefile table, achieve the desired result? > > I'm on hadoop-0.18.3 & hive-0.3.0 > > PS: More details on the Wiki around compresses storage would be really > appreciated. > > Saurabh. > > On Fri, Jul 24, 2009 at 10:02 PM, Neal Richter <[email protected]> wrote: >> >> gz files work fine. We're attaching daily directories of gziped logs >> in S3 as hive table partitions. >> >> Best to have your logrotator do hourly rotation to create lots of gz >> files for better mapping. OR one could use zcat, split, and gzip to >> divide into smaller chunks if you really only have one gz file per >> partition. >> >> On Fri, Jul 24, 2009 at 9:48 AM, <[email protected]> wrote: >> > Have not checked gzip out yet but Hive is happy with .bz2 files. The >> > documentation on this is spotty. It seems that any Hadoop supported >> > compression will work. The issue with .gz files is that they will not be >> > splittable. That is one map will process an entire file so if your .gz >> > files >> > are large and you have more map capability than files you will not be >> > able >> > to make use of it. >> > >> > On Jul 24, 2009 10:09am, Saurabh Nanda <[email protected]> wrote: >> >> Please excuse my ignorance, but can I import gzip compressed files >> >> directly as Hive tables? I have separate gzip files for each days >> >> weblog >> >> data. Right now I am gunzipping them and then importing into a raw >> >> table. >> >> Can I import the gzipped files directly into Hive? >> >> >> >> >> >> Saurabh. >> >> >> >> On Wed, Jul 22, 2009 at 1:07 AM, Ashish Thusoo [email protected]> >> >> wrote: >> >> >> >> I don't think these are splittable. Compression on sequencefiles is >> >> splittable across sequencefile blocks. >> >> >> >> >> >> >> >> Ashish >> >> >> >> >> >> >> >> >> >> -----Original Message----- >> >> >> >> From: Bill Craig [mailto:[email protected]] >> >> >> >> Sent: Tuesday, July 21, 2009 8:06 AM >> >> >> >> To: [email protected] >> >> >> >> Subject: bz2 Splits. >> >> >> >> >> >> >> >> I loaded 5 files of bzip2 compressed data into a table in Hive. Three >> >> are >> >> small test files containing 10,000 records. Two were large ~8Gb >> >> compressed. >> >> >> >> When I run a query against the table I see three tasks that complete >> >> almost immediately and two tasks that run for a very long time. It >> >> appears >> >> to me that Hive/Hadoop is not splitting the input of the *.bz2. I have >> >> seen >> >> some old mails about this, but could not find any resolution for this >> >> problem. I compressed the files using the Apache bz2 jar, the file are >> >> named >> >> *.bz2. I am using Hadoop >> >> >> >> >> >> 0.19.1 r745977 >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> http://nandz.blogspot.com >> >> http://foodieforlife.blogspot.com >> >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
