It's fixed. We didn't figure out caused it, but we seem to have fixed it by upgrading to the latest cloudera version of hive.
thanks On 24 February 2010 11:25, Tim Sell <[email protected]> wrote: > Hi again, > > mapred.min.split.size=0 > dfs.block.size=134217728 > > > > On 23 February 2010 21:54, Namit Jain <[email protected]> wrote: >> Can you check the parameters: mapred.min.split.size and dfs.block.size ? >> >> -----Original Message----- >> From: Tim Sell [mailto:[email protected]] >> Sent: Tuesday, February 23, 2010 11:26 AM >> To: [email protected] >> Subject: Re: Hive jobs only run with 1 map task >> >> It happens on a table that is a single 30 gig tab separated file. >> It also happens on tables that are split over a hundreds files. >> >> >> On 23 February 2010 19:20, Namit Jain <[email protected]> wrote: >>> What is the size of the input data for the query ? >>> >>> Since you are using CombineHiveInputFormat, multiple files can be read by a >>> single mapper. >>> >>> >>> >>> -namit >>> >>> -----Original Message----- >>> From: Tim Sell [mailto:[email protected]] >>> Sent: Tuesday, February 23, 2010 11:14 AM >>> To: [email protected] >>> Subject: Re: Hive jobs only run with 1 map task >>> >>> If it helps looking at the job conf in the map reduce logs I noticed >>> mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat >>> >>> >>> On 23 February 2010 19:11, Tim Sell <[email protected]> wrote: >>>> Is hive.input.format set on the table? I'm not sure how to pull that >>>> out again. I know they are stored as text though. >>>> I should mention they do actually parse/process correctly. >>>> >>>> Here are all the set parameters >>>> >>>> hive> set; >>>> silent=off >>>> javax.jdo.option.ConnectionUserName=hive >>>> hive.exec.reducers.bytes.per.reducer=100000000 >>>> hive.mapred.local.mem=0 >>>> datanucleus.autoStartMechanismMode=checked >>>> hive.metastore.connect.retries=5 >>>> datanucleus.validateColumns=false >>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore >>>> datanucleus.autoCreateSchema=true >>>> javax.jdo.option.ConnectionPassword=hive >>>> datanucleus.validateConstraints=false >>>> datancucleus.transactionIsolation=read-committed >>>> datanucleus.validateTables=false >>>> hive.map.aggr.hash.min.reduction=0.5 >>>> datanucleus.storeManagerType=rdbms >>>> hive.exec.script.maxerrsize=100000 >>>> hive.merge.size.per.task=256000000 >>>> hive.test.mode.prefix=test_ >>>> hive.groupby.skewindata=false >>>> hive.default.fileformat=TextFile >>>> hive.script.auto.progress=false >>>> hive.groupby.mapaggr.checkinterval=100000 >>>> hive.hwi.listen.port=9999 >>>> datanuclues.cache.level2=true >>>> hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war >>>> hive.merge.mapfiles=true >>>> hive.exec.compress.output=false >>>> datanuclues.cache.level2.type=SOFT >>>> javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver >>>> hive.map.aggr=true >>>> hive.join.emit.interval=1000 >>>> hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse >>>> javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory >>>> hive.mapred.mode=nonstrict >>>> hive.exec.scratchdir=/tmp/hive-${user.name} >>>> javax.jdo.option.NonTransactionalRead=true >>>> hive.metastore.local=true >>>> hive.test.mode.samplefreq=32 >>>> hive.test.mode=false >>>> javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true >>>> javax.jdo.option.DetachAllOnCommit=true >>>> hive.heartbeat.interval=1000 >>>> hive.map.aggr.hash.percentmemory=0.5 >>>> hive.exec.reducers.max=107 >>>> hive.hwi.listen.host=0.0.0.0 >>>> hive.exec.compress.intermediate=false >>>> hive.optimize.cp=true >>>> hive.optimize.ppd=true >>>> hive.session.id=tims_201002231907 >>>> hive.merge.mapredfiles=false >>>> >>>> ~Tim. >>>> >>>> On 23 February 2010 19:03, Namit Jain <[email protected]> wrote: >>>>> Can you check your input format ? >>>>> >>>>> Can you check the value of the parameter : >>>>> hive.input.format ? >>>>> >>>>> Can you send all the parameters ? >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> -namit >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Tim Sell [mailto:[email protected]] >>>>> Sent: Tuesday, February 23, 2010 11:00 AM >>>>> To: [email protected] >>>>> Subject: Hive jobs only run with 1 map task >>>>> >>>>> We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our >>>>> same hive package kept working against the new hadoop setup. >>>>> >>>>> Since the upgrade every hive starts with only 1 map task though. Even >>>>> after setting it with eg: set mapred.map.tasks=32; >>>>> We recompiled our hive setup against hadoop 0.20 and still get the same >>>>> issue. >>>>> >>>>> Any suggestions for something obvious we might have missed? >>>>> >>>>> ~Tim. >>>>> >>>> >>> >> >
