Hi, mohandes.zebeleh you can adjust parameter as below( Major Compaction, Minor Compaction, Split): if you do not set, it will retain default value(1).
<property> <name>hbase.regionserver.thread.compaction.large</name> <value>5</value> </property> <property> <name>hbase.regionserver.thread.compaction.small</name> <value>10</value> </property> <property> <name>hbase.regionserver.thread.split</name> <value>5</value> </property> Regards! Bing 2013/1/14 Farrokh Shahriari <[email protected]> > Bing Jiang, What do you mean by add compaction thread number ? Because, in > Hbase-site.xml we have compactionqueuesize or compactionthreshold but not > the parameter that you have said. > > Thanks you if you guide me. > > On Sun, Jan 13, 2013 at 7:00 PM, Ted Yu <[email protected]> wrote: > > > Both HFileOutputFormat and LoadIncrementalHFiles are in mapreduce > package. > > > > Cheers > > > > On Sun, Jan 13, 2013 at 1:31 AM, Bing Jiang <[email protected] > > >wrote: > > > > > hi,anoop. > > > Why not hbase mapreduce package contains the tools like this? > > > > > > Anoop John <[email protected]>编写: > > > > > > >Hi > > > > Can you think of using HFileOutputFormat ? Here you use > > > >TableOutputFormat now. There will be put calls to HTable. Instead in > > > >HFileOutput format the MR will write the HFiles directly.[No flushes , > > > >compactions] Later using LoadIncrementalHFiles need to load the HFiles > > to > > > >the regions. May help you.. > > > > > > > >-Anoop- > > > > > > > >On Sun, Jan 13, 2013 at 10:59 AM, Farrokh Shahriari < > > > >[email protected]> wrote: > > > > > > > >> Thank you guys,let me change these configuration & test mapreduce > > again. > > > >> > > > >> On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika <[email protected] > > > > > >> wrote: > > > >> > > > >> > Start by testing HDFS throughput by doing s simple copyFromLocal > > using > > > >> > Hadoop command line shell (bin/hadoop fs -copyFromLocal > > pathTo8GBFile > > > >> > /tmp/dummyFile1). If you have 1000Mbit/sec network between the > > > computers, > > > >> > you should get around 75 MB/sec. > > > >> > > > > >> > On Tuesday, January 8, 2013, Bing Jiang wrote: > > > >> > > > > >> > > In our experience, it can enhance mapreduce insert by > > > >> > > 1.add regionserver flush thread number > > > >> > > 2.add memstore/jvm_heap > > > >> > > 3.pre split table region before mapreduce > > > >> > > 4.add large and small compaction thread number. > > > >> > > > > > >> > > please correct me if wrong, or any other better ideas. > > > >> > > On Jan 8, 2013 4:02 PM, "lars hofhansl" <[email protected] > > > >> <javascript:;>> > > > >> > > wrote: > > > >> > > > > > >> > > > What type of disks and how many? > > > >> > > > With the default replication factor your 2 (or 6) GB are > > actually > > > >> > > > replicated 3 times. > > > >> > > > 6GB/80s = 75MB/s, twice that if you do not disable the WAL, > > which > > > a > > > >> > > > reasonable machine should be able to absorb. > > > >> > > > The fact that deferred log flush does not help you seems to > > > indicate > > > >> > that > > > >> > > > you're over IO bound. > > > >> > > > > > > >> > > > > > > >> > > > What's your memstore flush size? Potentially the data is > written > > > many > > > >> > > > times during compactions. > > > >> > > > > > > >> > > > > > > >> > > > In your case you dial down the HDFS replication, since you > only > > > have > > > >> > two > > > >> > > > physical machines anyway. > > > >> > > > (Set it to 2. If you do not specify any failure zones, you > might > > > as > > > >> > well > > > >> > > > set it to 1... You will lose data if one of your server > machines > > > dies > > > >> > > > anyway). > > > >> > > > > > > >> > > > It does not really make that much sense to deploy HBase and > HDFS > > > on > > > >> > > > virtual nodes like this. > > > >> > > > -- Lars > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > ________________________________ > > > >> > > > From: Farrokh Shahriari <[email protected] > > > >> <javascript:;>> > > > >> > > > To: [email protected] <javascript:;> > > > >> > > > Sent: Monday, January 7, 2013 9:38 PM > > > >> > > > Subject: Re: Tune MapReduce over HBase to insert data > > > >> > > > > > > >> > > > Hi again, > > > >> > > > I'm using HBase 0.92.1-cdh4.0.0. > > > >> > > > I have two server machine with 48Gb RAM,12 physical core & 24 > > > logical > > > >> > > core > > > >> > > > that contain 12 nodes(6 nodes on each server). Each node has > 8Gb > > > RAM > > > >> & > > > >> > 2 > > > >> > > > VCPU. > > > >> > > > I've set some parameter that get better result like set > WAL=off > > on > > > >> > > put,but > > > >> > > > some parameters like Heap-size,Deferred log flush don't help > me. > > > >> > > > Beside that I have another question,why each time I've run > > > >> > mapreduce,I've > > > >> > > > got different result time while all the config & hardware are > > > same & > > > >> > not > > > >> > > > change ? > > > >> > > > > > > >> > > > Tnx you guys > > > >> > > > > > > >> > > > On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu <[email protected] > > > >> > <javascript:;>> > > > >> > > wrote: > > > >> > > > > > > >> > > > > Have you read through > > > >> http://hbase.apache.org/book.html#performance? > > > >> > > > > > > > >> > > > > What version of HBase are you using ? > > > >> > > > > > > > >> > > > > Cheers > > > >> > > > > > > > >> > > > > On Mon, Jan 7, 2013 at 9:05 PM, Farrokh Shahriari < > > > >> > > > > [email protected] <javascript:;>> wrote: > > > >> > > > > > > > >> > > > > > Hi there > > > >> > > > > > I have a cluster with 12 nodes that each of them has 2 > core > > of > > > >> CPU. > > > >> > > > Now,I > > > >> > > > > > want insert large data about 2Gb in 80 sec ( or 6Gb in > > 240sec > > > ). > > > >> > I've > > > >> > > > > used > > > >> > > > > > Map-Reduce over hbase,but I can't achieve proper result . > > > >> > > > > > I'd be glad if you tell me what I can do to get better > > result > > > or > > > >> > > which > > > >> > > > > > parameters should I config or tune to improve > > Map-Reduce/Hbase > > > >> > > > > performance > > > >> > > > > > ? > > > >> > > > > > > > > >> > > > > > Tnx > > > >> > > > > > > > > >> > > > > > > > >> > > > > > >> > > > > >> > > > > > > -- Bing Jiang Tel:(86)134-2619-1361 weibo: http://weibo.com/jiangbinglover BLOG: http://blog.sina.com.cn/jiangbinglover National Research Center for Intelligent Computing Systems Institute of Computing technology Graduate University of Chinese Academy of Science
