Re: Help with Compressed Storage

prasenjit mukherjee Wed, 17 Feb 2010 19:01:40 -0800

I have a similar issue with bz2 files. I have the hadoop directories :

/ip/data/ : containing unzipped text files ( foo1.txt, foo2.txt )
/ip/datacompressed/ : containing same files bzipped (  foo1.bz2, foo2.bz2 )


CREATE EXTERNAL TABLE tx_log(id1 string, id2 string, id3 string)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002'
   LOCATION '/ip/datacompressed/';
SELECT *  FROM tx_log limit 1;

The command works fine with LOCATION '/ip/data/' but doesnt work with
LOCATION '/ip/datacompressed/'

Any pointers ? I thought ( like Pig  ) hive automatically detects .bz2
extensions and applies appropriate decompression. Am I wrong ?

-Prasen


On Thu, Feb 18, 2010 at 3:04 AM, Zheng Shao <[email protected]> wrote:

> I just corrected the wiki page. It will also be a good idea to support
> case-insensitive boolean values in the code.
>
> Zheng
>
> On Wed, Feb 17, 2010 at 9:27 AM, Brent Miller <[email protected]>
> wrote:
> > Thanks Adam, that works for me as well.
> > It seems that the property for hive.exec.compress.output is case
> sensitive,
> > and when it is set to TRUE (as it is on the compressed storage page on
> the
> > wiki) it is ignored by hive.
> >
> > -Brent
> >
> > On Tue, Feb 16, 2010 at 4:24 PM, Adam O'Donnell <[email protected]>
> wrote:
> >>
> >> Adding these to my hive-site.xml file worked fine:
> >>
> >>  <property>
> >>        <name>hive.exec.compress.output</name>
> >>        <value>true</value>
> >>        <description>Compress output</description>
> >>  </property>
> >>
> >>  <property>
> >>        <name>mapred.output.compression.type</name>
> >>        <value>BLOCK</value>
> >>        <description>Block compression</description>
> >>  </property>
> >>
> >>
> >> On Tue, Feb 16, 2010 at 1:43 PM, Brent Miller <
> [email protected]>
> >> wrote:
> >> > Hello, I've seen issues similar to this one come up once or twice
> >> > before,
> >> > but I haven't ever seen a solution to the problem that I'm having. I
> was
> >> > following the Compressed Storage page on the Hive
> >> > Wiki http://wiki.apache.org/hadoop/CompressedStorage and realized
> that
> >> > the
> >> > sequence files that are created in the warehouse directory are
> actually
> >> > uncompressed and larger than than the originals.
> >> > For example, I have a table 'test1' who's input data looks something
> >> > like:
> >> > 0,1369962224,2010/02/01,00:00:00.101,0C030301,4,0000BD43
> >> > 0,1369962225,2010/02/01,00:00:00.101,0C030501,4,66268E43
> >> > 0,1369962226,2010/02/01,00:00:00.101,0C030701,4,041F3341
> >> > ...
> >> > And after creating a second table 'test1_comp' that was crated with
> the
> >> > STORED AS SEQUENCEFILE directive and the compression options SET as
> >> > described in the wiki, I can look at the resultant sequence files and
> >> > see
> >> > that they're just plain (uncompressed) text:
> >> > SEQ "org.apache.hadoop.io.BytesWritable
> >> > org.apache.hadoop.io.Text+�c�!Y�M ��
> >> > Z^��= 80,1369962224,2010/02/01,00:00:00.101,0C030301,4,0000BD43=
> >> > 80,1369962225,2010/02/01,00:00:00.101,0C030501,4,66268E43=
> >> > 80,1369962226,2010/02/01,00:00:00.101,0C030701,4,041F3341=
> >> > 80,1369962227,2010/02/01,00:00:00.101,0C030901,4,11360141=
> >> > ...
> >> > I've tried messing around with
> different org.apache.hadoop.io.compress.*
> >> > options, but the sequence files always come out uncompressed. Has
> >> > anybody
> >> > ever seen this or know away to keep the data compressed? Since the
> input
> >> > text is so uniform, we get huge space savings from compression and
> would
> >> > like to store the data this way if possible. I'm using Hadoop 20.1 and
> >> > Hive
> >> > that I checked out from SVN about a week ago.
> >> > Thanks,
> >> > Brent
> >>
> >>
> >>
> >> --
> >> Adam J. O'Donnell, Ph.D.
> >> Immunet Corporation
> >> Cell: +1 (267) 251-0070
> >
> >
>
>
>
> --
> Yours,
> Zheng
>

Re: Help with Compressed Storage

Reply via email to