Re: Tune MapReduce over HBase to insert data

Farrokh Shahriari Sat, 12 Jan 2013 21:29:44 -0800

Thank you guys,let me change these configuration & test mapreduce again.


On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika <[email protected]> wrote:

> Start by testing HDFS throughput by doing s simple copyFromLocal using
> Hadoop command line shell (bin/hadoop fs -copyFromLocal pathTo8GBFile
> /tmp/dummyFile1). If you have 1000Mbit/sec network between the computers,
> you should get around 75 MB/sec.
>
> On Tuesday, January 8, 2013, Bing Jiang wrote:
>
> > In our experience, it can enhance mapreduce insert by
> > 1.add regionserver flush thread number
> > 2.add memstore/jvm_heap
> > 3.pre split table region before mapreduce
> > 4.add large and small compaction thread number.
> >
> > please correct me if wrong, or any other better ideas.
> > On Jan 8, 2013 4:02 PM, "lars hofhansl" <[email protected]<javascript:;>>
> > wrote:
> >
> > > What type of disks and how many?
> > > With the default replication factor your 2 (or 6) GB are actually
> > > replicated 3 times.
> > > 6GB/80s = 75MB/s, twice that if you do not disable the WAL, which a
> > > reasonable machine should be able to absorb.
> > > The fact that deferred log flush does not help you seems to indicate
> that
> > > you're over IO bound.
> > >
> > >
> > > What's your memstore flush size? Potentially the data is written many
> > > times during compactions.
> > >
> > >
> > > In your case you dial down the HDFS replication, since you only have
> two
> > > physical machines anyway.
> > > (Set it to 2. If you do not specify any failure zones, you might as
> well
> > > set it to 1... You will lose data if one of your server machines dies
> > > anyway).
> > >
> > > It does not really make that much sense to deploy HBase and HDFS on
> > > virtual nodes like this.
> > > -- Lars
> > >
> > >
> > >
> > > ________________________________
> > >  From: Farrokh Shahriari <[email protected]<javascript:;>>
> > > To: [email protected] <javascript:;>
> > > Sent: Monday, January 7, 2013 9:38 PM
> > > Subject: Re: Tune MapReduce over HBase to insert data
> > >
> > > Hi again,
> > > I'm using HBase 0.92.1-cdh4.0.0.
> > > I have two server machine with 48Gb RAM,12 physical core & 24 logical
> > core
> > > that contain 12 nodes(6 nodes on each server). Each node has 8Gb RAM &
> 2
> > > VCPU.
> > > I've set some parameter that get better result like set WAL=off on
> > put,but
> > > some parameters like Heap-size,Deferred log flush don't help me.
> > > Beside that I have another question,why each time I've run
> mapreduce,I've
> > > got different result time while all the config & hardware are same &
> not
> > > change ?
> > >
> > > Tnx you guys
> > >
> > > On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu <[email protected]
> <javascript:;>>
> > wrote:
> > >
> > > > Have you read through http://hbase.apache.org/book.html#performance?
> > > >
> > > > What version of HBase are you using ?
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Jan 7, 2013 at 9:05 PM, Farrokh Shahriari <
> > > > [email protected] <javascript:;>> wrote:
> > > >
> > > > > Hi there
> > > > > I have a cluster with 12 nodes that each of them has 2 core of CPU.
> > > Now,I
> > > > > want insert large data about 2Gb in 80 sec ( or 6Gb in 240sec ).
> I've
> > > > used
> > > > > Map-Reduce over hbase,but I can't achieve proper result .
> > > > > I'd be glad if you tell me what I can do to get better result or
> > which
> > > > > parameters should I config or tune to improve Map-Reduce/Hbase
> > > > performance
> > > > > ?
> > > > >
> > > > > Tnx
> > > > >
> > > >
> >
>

Re: Tune MapReduce over HBase to insert data

Reply via email to