In our experience, it can enhance mapreduce insert by 1.add regionserver flush thread number 2.add memstore/jvm_heap 3.pre split table region before mapreduce 4.add large and small compaction thread number.
please correct me if wrong, or any other better ideas. On Jan 8, 2013 4:02 PM, "lars hofhansl" <[email protected]> wrote: > What type of disks and how many? > With the default replication factor your 2 (or 6) GB are actually > replicated 3 times. > 6GB/80s = 75MB/s, twice that if you do not disable the WAL, which a > reasonable machine should be able to absorb. > The fact that deferred log flush does not help you seems to indicate that > you're over IO bound. > > > What's your memstore flush size? Potentially the data is written many > times during compactions. > > > In your case you dial down the HDFS replication, since you only have two > physical machines anyway. > (Set it to 2. If you do not specify any failure zones, you might as well > set it to 1... You will lose data if one of your server machines dies > anyway). > > It does not really make that much sense to deploy HBase and HDFS on > virtual nodes like this. > -- Lars > > > > ________________________________ > From: Farrokh Shahriari <[email protected]> > To: [email protected] > Sent: Monday, January 7, 2013 9:38 PM > Subject: Re: Tune MapReduce over HBase to insert data > > Hi again, > I'm using HBase 0.92.1-cdh4.0.0. > I have two server machine with 48Gb RAM,12 physical core & 24 logical core > that contain 12 nodes(6 nodes on each server). Each node has 8Gb RAM & 2 > VCPU. > I've set some parameter that get better result like set WAL=off on put,but > some parameters like Heap-size,Deferred log flush don't help me. > Beside that I have another question,why each time I've run mapreduce,I've > got different result time while all the config & hardware are same & not > change ? > > Tnx you guys > > On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu <[email protected]> wrote: > > > Have you read through http://hbase.apache.org/book.html#performance ? > > > > What version of HBase are you using ? > > > > Cheers > > > > On Mon, Jan 7, 2013 at 9:05 PM, Farrokh Shahriari < > > [email protected]> wrote: > > > > > Hi there > > > I have a cluster with 12 nodes that each of them has 2 core of CPU. > Now,I > > > want insert large data about 2Gb in 80 sec ( or 6Gb in 240sec ). I've > > used > > > Map-Reduce over hbase,but I can't achieve proper result . > > > I'd be glad if you tell me what I can do to get better result or which > > > parameters should I config or tune to improve Map-Reduce/Hbase > > performance > > > ? > > > > > > Tnx > > > > >
