Hi again, mapred.min.split.size=0 dfs.block.size=134217728
On 23 February 2010 21:54, Namit Jain <[email protected]> wrote: > Can you check the parameters: mapred.min.split.size and dfs.block.size ? > > -----Original Message----- > From: Tim Sell [mailto:[email protected]] > Sent: Tuesday, February 23, 2010 11:26 AM > To: [email protected] > Subject: Re: Hive jobs only run with 1 map task > > It happens on a table that is a single 30 gig tab separated file. > It also happens on tables that are split over a hundreds files. > > > On 23 February 2010 19:20, Namit Jain <[email protected]> wrote: >> What is the size of the input data for the query ? >> >> Since you are using CombineHiveInputFormat, multiple files can be read by a >> single mapper. >> >> >> >> -namit >> >> -----Original Message----- >> From: Tim Sell [mailto:[email protected]] >> Sent: Tuesday, February 23, 2010 11:14 AM >> To: [email protected] >> Subject: Re: Hive jobs only run with 1 map task >> >> If it helps looking at the job conf in the map reduce logs I noticed >> mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat >> >> >> On 23 February 2010 19:11, Tim Sell <[email protected]> wrote: >>> Is hive.input.format set on the table? I'm not sure how to pull that >>> out again. I know they are stored as text though. >>> I should mention they do actually parse/process correctly. >>> >>> Here are all the set parameters >>> >>> hive> set; >>> silent=off >>> javax.jdo.option.ConnectionUserName=hive >>> hive.exec.reducers.bytes.per.reducer=100000000 >>> hive.mapred.local.mem=0 >>> datanucleus.autoStartMechanismMode=checked >>> hive.metastore.connect.retries=5 >>> datanucleus.validateColumns=false >>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore >>> datanucleus.autoCreateSchema=true >>> javax.jdo.option.ConnectionPassword=hive >>> datanucleus.validateConstraints=false >>> datancucleus.transactionIsolation=read-committed >>> datanucleus.validateTables=false >>> hive.map.aggr.hash.min.reduction=0.5 >>> datanucleus.storeManagerType=rdbms >>> hive.exec.script.maxerrsize=100000 >>> hive.merge.size.per.task=256000000 >>> hive.test.mode.prefix=test_ >>> hive.groupby.skewindata=false >>> hive.default.fileformat=TextFile >>> hive.script.auto.progress=false >>> hive.groupby.mapaggr.checkinterval=100000 >>> hive.hwi.listen.port=9999 >>> datanuclues.cache.level2=true >>> hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war >>> hive.merge.mapfiles=true >>> hive.exec.compress.output=false >>> datanuclues.cache.level2.type=SOFT >>> javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver >>> hive.map.aggr=true >>> hive.join.emit.interval=1000 >>> hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse >>> javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory >>> hive.mapred.mode=nonstrict >>> hive.exec.scratchdir=/tmp/hive-${user.name} >>> javax.jdo.option.NonTransactionalRead=true >>> hive.metastore.local=true >>> hive.test.mode.samplefreq=32 >>> hive.test.mode=false >>> javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true >>> javax.jdo.option.DetachAllOnCommit=true >>> hive.heartbeat.interval=1000 >>> hive.map.aggr.hash.percentmemory=0.5 >>> hive.exec.reducers.max=107 >>> hive.hwi.listen.host=0.0.0.0 >>> hive.exec.compress.intermediate=false >>> hive.optimize.cp=true >>> hive.optimize.ppd=true >>> hive.session.id=tims_201002231907 >>> hive.merge.mapredfiles=false >>> >>> ~Tim. >>> >>> On 23 February 2010 19:03, Namit Jain <[email protected]> wrote: >>>> Can you check your input format ? >>>> >>>> Can you check the value of the parameter : >>>> hive.input.format ? >>>> >>>> Can you send all the parameters ? >>>> >>>> >>>> >>>> Thanks, >>>> -namit >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: Tim Sell [mailto:[email protected]] >>>> Sent: Tuesday, February 23, 2010 11:00 AM >>>> To: [email protected] >>>> Subject: Hive jobs only run with 1 map task >>>> >>>> We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our >>>> same hive package kept working against the new hadoop setup. >>>> >>>> Since the upgrade every hive starts with only 1 map task though. Even >>>> after setting it with eg: set mapred.map.tasks=32; >>>> We recompiled our hive setup against hadoop 0.20 and still get the same >>>> issue. >>>> >>>> Any suggestions for something obvious we might have missed? >>>> >>>> ~Tim. >>>> >>> >> >
