It's fixed.
We didn't figure out caused it, but we seem to have fixed it by
upgrading to the latest cloudera version of hive.

thanks

On 24 February 2010 11:25, Tim Sell <[email protected]> wrote:
> Hi again,
>
> mapred.min.split.size=0
> dfs.block.size=134217728
>
>
>
> On 23 February 2010 21:54, Namit Jain <[email protected]> wrote:
>> Can you check the parameters: mapred.min.split.size and dfs.block.size ?
>>
>> -----Original Message-----
>> From: Tim Sell [mailto:[email protected]]
>> Sent: Tuesday, February 23, 2010 11:26 AM
>> To: [email protected]
>> Subject: Re: Hive jobs only run with 1 map task
>>
>> It happens on a table that is a single 30 gig tab separated file.
>> It also happens on tables that are split over a hundreds files.
>>
>>
>> On 23 February 2010 19:20, Namit Jain <[email protected]> wrote:
>>> What is the size of the input data for the query ?
>>>
>>> Since you are using CombineHiveInputFormat, multiple files can be read by a 
>>> single mapper.
>>>
>>>
>>>
>>> -namit
>>>
>>> -----Original Message-----
>>> From: Tim Sell [mailto:[email protected]]
>>> Sent: Tuesday, February 23, 2010 11:14 AM
>>> To: [email protected]
>>> Subject: Re: Hive jobs only run with 1 map task
>>>
>>> If it helps looking at the job conf in the map reduce logs I noticed
>>> mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
>>>
>>>
>>> On 23 February 2010 19:11, Tim Sell <[email protected]> wrote:
>>>> Is hive.input.format set on the table? I'm not sure how to pull that
>>>> out again. I know they are stored as text though.
>>>> I should mention they do actually parse/process correctly.
>>>>
>>>> Here are all the set parameters
>>>>
>>>> hive> set;
>>>> silent=off
>>>> javax.jdo.option.ConnectionUserName=hive
>>>> hive.exec.reducers.bytes.per.reducer=100000000
>>>> hive.mapred.local.mem=0
>>>> datanucleus.autoStartMechanismMode=checked
>>>> hive.metastore.connect.retries=5
>>>> datanucleus.validateColumns=false
>>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
>>>> datanucleus.autoCreateSchema=true
>>>> javax.jdo.option.ConnectionPassword=hive
>>>> datanucleus.validateConstraints=false
>>>> datancucleus.transactionIsolation=read-committed
>>>> datanucleus.validateTables=false
>>>> hive.map.aggr.hash.min.reduction=0.5
>>>> datanucleus.storeManagerType=rdbms
>>>> hive.exec.script.maxerrsize=100000
>>>> hive.merge.size.per.task=256000000
>>>> hive.test.mode.prefix=test_
>>>> hive.groupby.skewindata=false
>>>> hive.default.fileformat=TextFile
>>>> hive.script.auto.progress=false
>>>> hive.groupby.mapaggr.checkinterval=100000
>>>> hive.hwi.listen.port=9999
>>>> datanuclues.cache.level2=true
>>>> hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
>>>> hive.merge.mapfiles=true
>>>> hive.exec.compress.output=false
>>>> datanuclues.cache.level2.type=SOFT
>>>> javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
>>>> hive.map.aggr=true
>>>> hive.join.emit.interval=1000
>>>> hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
>>>> javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
>>>> hive.mapred.mode=nonstrict
>>>> hive.exec.scratchdir=/tmp/hive-${user.name}
>>>> javax.jdo.option.NonTransactionalRead=true
>>>> hive.metastore.local=true
>>>> hive.test.mode.samplefreq=32
>>>> hive.test.mode=false
>>>> javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
>>>> javax.jdo.option.DetachAllOnCommit=true
>>>> hive.heartbeat.interval=1000
>>>> hive.map.aggr.hash.percentmemory=0.5
>>>> hive.exec.reducers.max=107
>>>> hive.hwi.listen.host=0.0.0.0
>>>> hive.exec.compress.intermediate=false
>>>> hive.optimize.cp=true
>>>> hive.optimize.ppd=true
>>>> hive.session.id=tims_201002231907
>>>> hive.merge.mapredfiles=false
>>>>
>>>> ~Tim.
>>>>
>>>> On 23 February 2010 19:03, Namit Jain <[email protected]> wrote:
>>>>> Can you check your input format ?
>>>>>
>>>>> Can you check the value of the parameter :
>>>>> hive.input.format ?
>>>>>
>>>>> Can you send all the parameters ?
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> -namit
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Tim Sell [mailto:[email protected]]
>>>>> Sent: Tuesday, February 23, 2010 11:00 AM
>>>>> To: [email protected]
>>>>> Subject: Hive jobs only run with 1 map task
>>>>>
>>>>> We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
>>>>> same hive package kept working against the new hadoop setup.
>>>>>
>>>>> Since the upgrade every hive starts with only 1 map task though. Even
>>>>> after setting it with eg: set mapred.map.tasks=32;
>>>>> We recompiled our hive setup against hadoop 0.20 and still get the same 
>>>>> issue.
>>>>>
>>>>> Any suggestions for something obvious we might have missed?
>>>>>
>>>>> ~Tim.
>>>>>
>>>>
>>>
>>
>

Reply via email to