Re: Hive jobs only run with 1 map task

Tim Sell Wed, 24 Feb 2010 05:01:28 -0800

It's fixed.
We didn't figure out caused it, but we seem to have fixed it by
upgrading to the latest cloudera version of hive.


thanks

On 24 February 2010 11:25, Tim Sell <[email protected]> wrote:
> Hi again,
>
> mapred.min.split.size=0
> dfs.block.size=134217728
>
>
>
> On 23 February 2010 21:54, Namit Jain <[email protected]> wrote:
>> Can you check the parameters: mapred.min.split.size and dfs.block.size ?
>>
>> -----Original Message-----
>> From: Tim Sell [mailto:[email protected]]
>> Sent: Tuesday, February 23, 2010 11:26 AM
>> To: [email protected]
>> Subject: Re: Hive jobs only run with 1 map task
>>
>> It happens on a table that is a single 30 gig tab separated file.
>> It also happens on tables that are split over a hundreds files.
>>
>>
>> On 23 February 2010 19:20, Namit Jain <[email protected]> wrote:
>>> What is the size of the input data for the query ?
>>>
>>> Since you are using CombineHiveInputFormat, multiple files can be read by a 
>>> single mapper.
>>>
>>>
>>>
>>> -namit
>>>
>>> -----Original Message-----
>>> From: Tim Sell [mailto:[email protected]]
>>> Sent: Tuesday, February 23, 2010 11:14 AM
>>> To: [email protected]
>>> Subject: Re: Hive jobs only run with 1 map task
>>>
>>> If it helps looking at the job conf in the map reduce logs I noticed
>>> mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
>>>
>>>
>>> On 23 February 2010 19:11, Tim Sell <[email protected]> wrote:
>>>> Is hive.input.format set on the table? I'm not sure how to pull that
>>>> out again. I know they are stored as text though.
>>>> I should mention they do actually parse/process correctly.
>>>>
>>>> Here are all the set parameters
>>>>
>>>> hive> set;
>>>> silent=off
>>>> javax.jdo.option.ConnectionUserName=hive
>>>> hive.exec.reducers.bytes.per.reducer=100000000
>>>> hive.mapred.local.mem=0
>>>> datanucleus.autoStartMechanismMode=checked
>>>> hive.metastore.connect.retries=5
>>>> datanucleus.validateColumns=false
>>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
>>>> datanucleus.autoCreateSchema=true
>>>> javax.jdo.option.ConnectionPassword=hive
>>>> datanucleus.validateConstraints=false
>>>> datancucleus.transactionIsolation=read-committed
>>>> datanucleus.validateTables=false
>>>> hive.map.aggr.hash.min.reduction=0.5
>>>> datanucleus.storeManagerType=rdbms
>>>> hive.exec.script.maxerrsize=100000
>>>> hive.merge.size.per.task=256000000
>>>> hive.test.mode.prefix=test_
>>>> hive.groupby.skewindata=false
>>>> hive.default.fileformat=TextFile
>>>> hive.script.auto.progress=false
>>>> hive.groupby.mapaggr.checkinterval=100000
>>>> hive.hwi.listen.port=9999
>>>> datanuclues.cache.level2=true
>>>> hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
>>>> hive.merge.mapfiles=true
>>>> hive.exec.compress.output=false
>>>> datanuclues.cache.level2.type=SOFT
>>>> javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
>>>> hive.map.aggr=true
>>>> hive.join.emit.interval=1000
>>>> hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
>>>> javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
>>>> hive.mapred.mode=nonstrict
>>>> hive.exec.scratchdir=/tmp/hive-${user.name}
>>>> javax.jdo.option.NonTransactionalRead=true
>>>> hive.metastore.local=true
>>>> hive.test.mode.samplefreq=32
>>>> hive.test.mode=false
>>>> javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
>>>> javax.jdo.option.DetachAllOnCommit=true
>>>> hive.heartbeat.interval=1000
>>>> hive.map.aggr.hash.percentmemory=0.5
>>>> hive.exec.reducers.max=107
>>>> hive.hwi.listen.host=0.0.0.0
>>>> hive.exec.compress.intermediate=false
>>>> hive.optimize.cp=true
>>>> hive.optimize.ppd=true
>>>> hive.session.id=tims_201002231907
>>>> hive.merge.mapredfiles=false
>>>>
>>>> ~Tim.
>>>>
>>>> On 23 February 2010 19:03, Namit Jain <[email protected]> wrote:
>>>>> Can you check your input format ?
>>>>>
>>>>> Can you check the value of the parameter :
>>>>> hive.input.format ?
>>>>>
>>>>> Can you send all the parameters ?
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> -namit
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Tim Sell [mailto:[email protected]]
>>>>> Sent: Tuesday, February 23, 2010 11:00 AM
>>>>> To: [email protected]
>>>>> Subject: Hive jobs only run with 1 map task
>>>>>
>>>>> We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
>>>>> same hive package kept working against the new hadoop setup.
>>>>>
>>>>> Since the upgrade every hive starts with only 1 map task though. Even
>>>>> after setting it with eg: set mapred.map.tasks=32;
>>>>> We recompiled our hive setup against hadoop 0.20 and still get the same 
>>>>> issue.
>>>>>
>>>>> Any suggestions for something obvious we might have missed?
>>>>>
>>>>> ~Tim.
>>>>>
>>>>
>>>
>>
>

Re: Hive jobs only run with 1 map task

Reply via email to