I've noticed that if I comment the write command in Map function ( Context.write(row,put)),it will just take 40 sec. The differences is about 30 seconds,that's weird for me,what do you think ?
the parameters that are useful up to now: hbase.hstore.blockingStoreFiles => 20 hbase.hregion.memstore.block.multiplier => 4 hbase.hregion.memstore.flush.size => 1073741824 speculative.execution => false wal => false should I change these two parameter : io.sort.mb & io.sort.factor ? Mohandes On Tue, Jan 15, 2013 at 5:03 AM, Bing Jiang <[email protected]>wrote: > Hi, mohandes.zebeleh > you can adjust parameter as below( Major Compaction, Minor Compaction, > Split): > if you do not set, it will retain default value(1). > > <property> > <name>hbase.regionserver.thread.compaction.large</name> > <value>5</value> > </property> > <property> > <name>hbase.regionserver.thread.compaction.small</name> > <value>10</value> > </property> > <property> > <name>hbase.regionserver.thread.split</name> > <value>5</value> > </property> > > Regards! > > Bing > > 2013/1/14 Farrokh Shahriari <[email protected]> > >> Bing Jiang, What do you mean by add compaction thread number ? Because, in >> Hbase-site.xml we have compactionqueuesize or compactionthreshold but not >> the parameter that you have said. >> >> Thanks you if you guide me. >> >> On Sun, Jan 13, 2013 at 7:00 PM, Ted Yu <[email protected]> wrote: >> >> > Both HFileOutputFormat and LoadIncrementalHFiles are in mapreduce >> package. >> > >> > Cheers >> > >> > On Sun, Jan 13, 2013 at 1:31 AM, Bing Jiang <[email protected] >> > >wrote: >> > >> > > hi,anoop. >> > > Why not hbase mapreduce package contains the tools like this? >> > > >> > > Anoop John <[email protected]>编写: >> > > >> > > >Hi >> > > > Can you think of using HFileOutputFormat ? Here you use >> > > >TableOutputFormat now. There will be put calls to HTable. Instead in >> > > >HFileOutput format the MR will write the HFiles directly.[No flushes >> , >> > > >compactions] Later using LoadIncrementalHFiles need to load the >> HFiles >> > to >> > > >the regions. May help you.. >> > > > >> > > >-Anoop- >> > > > >> > > >On Sun, Jan 13, 2013 at 10:59 AM, Farrokh Shahriari < >> > > >[email protected]> wrote: >> > > > >> > > >> Thank you guys,let me change these configuration & test mapreduce >> > again. >> > > >> >> > > >> On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika < >> [email protected]> >> > > >> wrote: >> > > >> >> > > >> > Start by testing HDFS throughput by doing s simple copyFromLocal >> > using >> > > >> > Hadoop command line shell (bin/hadoop fs -copyFromLocal >> > pathTo8GBFile >> > > >> > /tmp/dummyFile1). If you have 1000Mbit/sec network between the >> > > computers, >> > > >> > you should get around 75 MB/sec. >> > > >> > >> > > >> > On Tuesday, January 8, 2013, Bing Jiang wrote: >> > > >> > >> > > >> > > In our experience, it can enhance mapreduce insert by >> > > >> > > 1.add regionserver flush thread number >> > > >> > > 2.add memstore/jvm_heap >> > > >> > > 3.pre split table region before mapreduce >> > > >> > > 4.add large and small compaction thread number. >> > > >> > > >> > > >> > > please correct me if wrong, or any other better ideas. >> > > >> > > On Jan 8, 2013 4:02 PM, "lars hofhansl" <[email protected] >> > > >> <javascript:;>> >> > > >> > > wrote: >> > > >> > > >> > > >> > > > What type of disks and how many? >> > > >> > > > With the default replication factor your 2 (or 6) GB are >> > actually >> > > >> > > > replicated 3 times. >> > > >> > > > 6GB/80s = 75MB/s, twice that if you do not disable the WAL, >> > which >> > > a >> > > >> > > > reasonable machine should be able to absorb. >> > > >> > > > The fact that deferred log flush does not help you seems to >> > > indicate >> > > >> > that >> > > >> > > > you're over IO bound. >> > > >> > > > >> > > >> > > > >> > > >> > > > What's your memstore flush size? Potentially the data is >> written >> > > many >> > > >> > > > times during compactions. >> > > >> > > > >> > > >> > > > >> > > >> > > > In your case you dial down the HDFS replication, since you >> only >> > > have >> > > >> > two >> > > >> > > > physical machines anyway. >> > > >> > > > (Set it to 2. If you do not specify any failure zones, you >> might >> > > as >> > > >> > well >> > > >> > > > set it to 1... You will lose data if one of your server >> machines >> > > dies >> > > >> > > > anyway). >> > > >> > > > >> > > >> > > > It does not really make that much sense to deploy HBase and >> HDFS >> > > on >> > > >> > > > virtual nodes like this. >> > > >> > > > -- Lars >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > ________________________________ >> > > >> > > > From: Farrokh Shahriari <[email protected] >> > > >> <javascript:;>> >> > > >> > > > To: [email protected] <javascript:;> >> > > >> > > > Sent: Monday, January 7, 2013 9:38 PM >> > > >> > > > Subject: Re: Tune MapReduce over HBase to insert data >> > > >> > > > >> > > >> > > > Hi again, >> > > >> > > > I'm using HBase 0.92.1-cdh4.0.0. >> > > >> > > > I have two server machine with 48Gb RAM,12 physical core & 24 >> > > logical >> > > >> > > core >> > > >> > > > that contain 12 nodes(6 nodes on each server). Each node has >> 8Gb >> > > RAM >> > > >> & >> > > >> > 2 >> > > >> > > > VCPU. >> > > >> > > > I've set some parameter that get better result like set >> WAL=off >> > on >> > > >> > > put,but >> > > >> > > > some parameters like Heap-size,Deferred log flush don't help >> me. >> > > >> > > > Beside that I have another question,why each time I've run >> > > >> > mapreduce,I've >> > > >> > > > got different result time while all the config & hardware are >> > > same & >> > > >> > not >> > > >> > > > change ? >> > > >> > > > >> > > >> > > > Tnx you guys >> > > >> > > > >> > > >> > > > On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu <[email protected] >> > > >> > <javascript:;>> >> > > >> > > wrote: >> > > >> > > > >> > > >> > > > > Have you read through >> > > >> http://hbase.apache.org/book.html#performance? >> > > >> > > > > >> > > >> > > > > What version of HBase are you using ? >> > > >> > > > > >> > > >> > > > > Cheers >> > > >> > > > > >> > > >> > > > > On Mon, Jan 7, 2013 at 9:05 PM, Farrokh Shahriari < >> > > >> > > > > [email protected] <javascript:;>> wrote: >> > > >> > > > > >> > > >> > > > > > Hi there >> > > >> > > > > > I have a cluster with 12 nodes that each of them has 2 >> core >> > of >> > > >> CPU. >> > > >> > > > Now,I >> > > >> > > > > > want insert large data about 2Gb in 80 sec ( or 6Gb in >> > 240sec >> > > ). >> > > >> > I've >> > > >> > > > > used >> > > >> > > > > > Map-Reduce over hbase,but I can't achieve proper result . >> > > >> > > > > > I'd be glad if you tell me what I can do to get better >> > result >> > > or >> > > >> > > which >> > > >> > > > > > parameters should I config or tune to improve >> > Map-Reduce/Hbase >> > > >> > > > > performance >> > > >> > > > > > ? >> > > >> > > > > > >> > > >> > > > > > Tnx >> > > >> > > > > > >> > > >> > > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> > > > > -- > Bing Jiang > Tel:(86)134-2619-1361 > weibo: http://weibo.com/jiangbinglover > BLOG: http://blog.sina.com.cn/jiangbinglover > National Research Center for Intelligent Computing Systems > Institute of Computing technology > Graduate University of Chinese Academy of Science >
