Zeng,
Thanks for your quick reply. but there is only 1 mapper for my job with 300
MB, bz2 file.
I added the following in my core-site.xml
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
My table definition:
create table test_bzip2
(
co1 string,
.
.
col20 string
)
row format delimited
fields terminated by '\t'
stored as textfile;
A simple grouping/count query and the following is the query's plan:
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
test_bzip2
TableScan
alias: test_bzip2
Select Operator
expressions:
expr: siteid
type: string
outputColumnNames: siteid
Reduce Output Operator
key expressions:
expr: siteid
type: string
sort order: +
Map-reduce partition columns:
expr: siteid
type: string
tag: -1
value expressions:
expr: 1
type: int
Reduce Operator Tree:
Group By Operator
aggregations:
expr: count(VALUE._col0)
bucketGroup: false
keys:
expr: KEY._col0
type: string
mode: complete
outputColumnNames: _col0, _col1
Select Operator
expressions:
expr: _col0
type: string
expr: _col1
type: bigint
outputColumnNames: _col0, _col1
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
I just verified bz2 splitting working in my cluster using a simple pig
script. the pig script makes 3 mapper for M/R job.
What should I check further? Job config info?
- Youngwoo
2010/4/22 Zheng Shao <[email protected]>
> It should be automatically supported. You don't need to do anything
> except adding the bzip2 codec in io.compression.codecs in hadoop
> configuration files (core-site.xml)
>
> Zheng
>
> On Wed, Apr 21, 2010 at 10:15 PM, 김영우 <[email protected]> wrote:
> > Hi,
> >
> > HADOOP-4012, https://issues.apache.org/jira/browse/HADOOP-4012 has been
> > committed. and CHD3 supports bzip2 splitting.
> > I'm wondering if Hive supports input splitting for bzip2 compreesed text
> > file(*.bz2). If not, Should I implement a custom SerDe for bzip2
> compressed
> > files?
> >
> > Thanks,
> > Youngwoo
> >
>
>
>
> --
> Yours,
> Zheng
> http://www.linkedin.com/in/zshao
>