RE: Hive jobs only run with 1 map task

Namit Jain Tue, 23 Feb 2010 11:21:32 -0800

What is the size of the input data for the query ?

Since you are using CombineHiveInputFormat, multiple files can be read by a 
single mapper.




-namit

-----Original Message-----
From: Tim Sell [mailto:[email protected]] 
Sent: Tuesday, February 23, 2010 11:14 AM
To: [email protected]
Subject: Re: Hive jobs only run with 1 map task

If it helps looking at the job conf in the map reduce logs I noticed
mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat


On 23 February 2010 19:11, Tim Sell <[email protected]> wrote:
> Is hive.input.format set on the table? I'm not sure how to pull that
> out again. I know they are stored as text though.
> I should mention they do actually parse/process correctly.
>
> Here are all the set parameters
>
> hive> set;
> silent=off
> javax.jdo.option.ConnectionUserName=hive
> hive.exec.reducers.bytes.per.reducer=100000000
> hive.mapred.local.mem=0
> datanucleus.autoStartMechanismMode=checked
> hive.metastore.connect.retries=5
> datanucleus.validateColumns=false
> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
> datanucleus.autoCreateSchema=true
> javax.jdo.option.ConnectionPassword=hive
> datanucleus.validateConstraints=false
> datancucleus.transactionIsolation=read-committed
> datanucleus.validateTables=false
> hive.map.aggr.hash.min.reduction=0.5
> datanucleus.storeManagerType=rdbms
> hive.exec.script.maxerrsize=100000
> hive.merge.size.per.task=256000000
> hive.test.mode.prefix=test_
> hive.groupby.skewindata=false
> hive.default.fileformat=TextFile
> hive.script.auto.progress=false
> hive.groupby.mapaggr.checkinterval=100000
> hive.hwi.listen.port=9999
> datanuclues.cache.level2=true
> hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
> hive.merge.mapfiles=true
> hive.exec.compress.output=false
> datanuclues.cache.level2.type=SOFT
> javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
> hive.map.aggr=true
> hive.join.emit.interval=1000
> hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
> javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
> hive.mapred.mode=nonstrict
> hive.exec.scratchdir=/tmp/hive-${user.name}
> javax.jdo.option.NonTransactionalRead=true
> hive.metastore.local=true
> hive.test.mode.samplefreq=32
> hive.test.mode=false
> javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
> javax.jdo.option.DetachAllOnCommit=true
> hive.heartbeat.interval=1000
> hive.map.aggr.hash.percentmemory=0.5
> hive.exec.reducers.max=107
> hive.hwi.listen.host=0.0.0.0
> hive.exec.compress.intermediate=false
> hive.optimize.cp=true
> hive.optimize.ppd=true
> hive.session.id=tims_201002231907
> hive.merge.mapredfiles=false
>
> ~Tim.
>
> On 23 February 2010 19:03, Namit Jain <[email protected]> wrote:
>> Can you check your input format ?
>>
>> Can you check the value of the parameter :
>> hive.input.format ?
>>
>> Can you send all the parameters ?
>>
>>
>>
>> Thanks,
>> -namit
>>
>>
>>
>> -----Original Message-----
>> From: Tim Sell [mailto:[email protected]]
>> Sent: Tuesday, February 23, 2010 11:00 AM
>> To: [email protected]
>> Subject: Hive jobs only run with 1 map task
>>
>> We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
>> same hive package kept working against the new hadoop setup.
>>
>> Since the upgrade every hive starts with only 1 map task though. Even
>> after setting it with eg: set mapred.map.tasks=32;
>> We recompiled our hive setup against hadoop 0.20 and still get the same 
>> issue.
>>
>> Any suggestions for something obvious we might have missed?
>>
>> ~Tim.
>>
>

RE: Hive jobs only run with 1 map task

Reply via email to