Zheng, It's 'org.apache.hadoop.hive.ql.io.HiveInputFormat'. and I don't know exactly MAPREDUCE-830 is in CDH3. but I could not find any clues.
Thanks for your help. - Youngwoo 2010/4/22 Zheng Shao <[email protected]> > Can you take a look at the "job.xml" link in your map-reduce job > created by Hive and let me know the mapred.input.format.class? > Is it HiveInputFormat or CombineHiveInputFormat? > > It should work if you set it to > org.apache.hadoop.hive.ql.io.HiveInputFormat > > Also, can you verify if > https://issues.apache.org/jira/browse/MAPREDUCE-830 is in your hadoop > distribution or not? > > Zheng > > On Wed, Apr 21, 2010 at 11:31 PM, 김영우 <[email protected]> wrote: > > Zeng, > > > > Thanks for your quick reply. but there is only 1 mapper for my job with > 300 > > MB, bz2 file. > > > > I added the following in my core-site.xml > > > > <property> > > <name>io.compression.codecs</name> > > > <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value> > > </property> > > > > My table definition: > > > > create table test_bzip2 > > ( > > co1 string, > > . > > . > > > > col20 string > > ) > > row format delimited > > fields terminated by '\t' > > stored as textfile; > > > > A simple grouping/count query and the following is the query's plan: > > STAGE PLANS: > > Stage: Stage-1 > > Map Reduce > > Alias -> Map Operator Tree: > > test_bzip2 > > TableScan > > alias: test_bzip2 > > Select Operator > > expressions: > > expr: siteid > > type: string > > outputColumnNames: siteid > > Reduce Output Operator > > key expressions: > > expr: siteid > > type: string > > sort order: + > > Map-reduce partition columns: > > expr: siteid > > type: string > > tag: -1 > > value expressions: > > expr: 1 > > type: int > > Reduce Operator Tree: > > Group By Operator > > aggregations: > > expr: count(VALUE._col0) > > bucketGroup: false > > keys: > > expr: KEY._col0 > > type: string > > mode: complete > > outputColumnNames: _col0, _col1 > > Select Operator > > expressions: > > expr: _col0 > > type: string > > expr: _col1 > > type: bigint > > outputColumnNames: _col0, _col1 > > File Output Operator > > compressed: false > > GlobalTableId: 0 > > table: > > input format: org.apache.hadoop.mapred.TextInputFormat > > output format: > > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > > > > Stage: Stage-0 > > Fetch Operator > > limit: -1 > > > > > > I just verified bz2 splitting working in my cluster using a simple pig > > script. the pig script makes 3 mapper for M/R job. > > > > What should I check further? Job config info? > > > > - Youngwoo > > > > 2010/4/22 Zheng Shao <[email protected]> > >> > >> It should be automatically supported. You don't need to do anything > >> except adding the bzip2 codec in io.compression.codecs in hadoop > >> configuration files (core-site.xml) > >> > >> Zheng > >> > >> On Wed, Apr 21, 2010 at 10:15 PM, 김영우 <[email protected]> wrote: > >> > Hi, > >> > > >> > HADOOP-4012, https://issues.apache.org/jira/browse/HADOOP-4012 has > been > >> > committed. and CHD3 supports bzip2 splitting. > >> > I'm wondering if Hive supports input splitting for bzip2 compreesed > text > >> > file(*.bz2). If not, Should I implement a custom SerDe for bzip2 > >> > compressed > >> > files? > >> > > >> > Thanks, > >> > Youngwoo > >> > > >> > >> > >> > >> -- > >> Yours, > >> Zheng > >> http://www.linkedin.com/in/zshao > > > > > > > > -- > Yours, > Zheng > http://www.linkedin.com/in/zshao >
