I've tried jvm reuse, useless too.. Total time is about 130s, data only 10M and all small files, 2 nodes.
hive/hadoop will run 350+ maps ... 2010/6/10 Edward Capriolo <[email protected]> > Also consider setting up jvm reuse this will deal with some mapper > startup penalty. > > How long is you query taking how much data is there? How many nodes? > > On Thursday, June 10, 2010, wd <[email protected]> wrote: > > set > hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > > > > and > > > > set hive.merge.size.per.task=1000000; > > set hive.merge.mapfiles=true; > > > > seames all useless here, time token for execute 'select a, count(1) from > t1 group by a' is almost the same. > > > > Have I missed some other settings ? > > > > 2010/6/10 wd <[email protected]> > > > > Thanks everyone, I'll try CombineHiveInputFormat. :) > > > > 2010/6/10 Namit Jain <[email protected]> > > > > > > CombineHiveInputFormat > > > > >
