Files in table raw_compressed start with this header:
SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text||'org.apache.hadoop.io.compress.GzipCodec
Files in table raw start with this header:
SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text
File size for raw_compressed: 250MB
File size for raw: 2150 MB
After "boolean isCompressed = conf.getCompressed();" should I put
"LOG.info("Compression config is:" + isCompressed);" ?
Saurabh.
On Fri, Aug 14, 2009 at 9:51 AM, Zheng Shao <[email protected]> wrote:
> What is the average file size in table raw?
>
> Can you put a log line in FileSinkOperator.java:107 ? That will tell
> us whether compression is turned on or not.
>
> Zheng
>
> On Thu, Aug 13, 2009 at 9:06 PM, Saurabh Nanda<[email protected]>
> wrote:
> >
> >> hive.exec.compress.output=true is the correct option. Can you post the
> >> "insert" command that you run which produced non-compressed results?
> >> Is the output in TextFileFormat or SequenceFileFormat?
> >
> > Here's the query. raw_compressed is a SequenceFile table with raw lines.
> raw
> > is a SequenceFile table with separate columns for each data field.
> >
> > from raw_compressed
> > insert overwrite table raw partition (dt='2009-04-02')
> > select transform(line) using 'parse_logs.rb' as ip_address, aid, uid,
> > ts, method, uri, response, referer, user_agent, cookies, ptime
> >
> > Saurabh.
> > --
> > http://nandz.blogspot.com
> > http://foodieforlife.blogspot.com
> >
>
>
>
> --
> Yours,
> Zheng
>
--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com