Re: Tune MapReduce over HBase to insert data

Farrokh Shahriari Wed, 16 Jan 2013 03:21:02 -0800

I've noticed that if I comment the write command in Map function (
Context.write(row,put)),it will just take 40 sec. The differences is about
30 seconds,that's weird for me,what do you think ?


the parameters that are useful up to now:
hbase.hstore.blockingStoreFiles => 20
hbase.hregion.memstore.block.multiplier => 4
hbase.hregion.memstore.flush.size => 1073741824
speculative.execution => false
wal => false

should I change these two parameter : io.sort.mb & io.sort.factor ?

Mohandes

On Tue, Jan 15, 2013 at 5:03 AM, Bing Jiang <[email protected]>wrote:

> Hi, mohandes.zebeleh
> you can adjust parameter as below( Major Compaction, Minor Compaction,
> Split):
> if you do not set, it will retain default value(1).
>
> <property>
>   <name>hbase.regionserver.thread.compaction.large</name>
>   <value>5</value>
> </property>
> <property>
>   <name>hbase.regionserver.thread.compaction.small</name>
>   <value>10</value>
> </property>
> <property>
>   <name>hbase.regionserver.thread.split</name>
>   <value>5</value>
> </property>
>
> Regards!
>
> Bing
>
> 2013/1/14 Farrokh Shahriari <[email protected]>
>
>> Bing Jiang, What do you mean by add compaction thread number ? Because, in
>> Hbase-site.xml we have compactionqueuesize or compactionthreshold but not
>> the parameter that you have said.
>>
>> Thanks you if you guide me.
>>
>> On Sun, Jan 13, 2013 at 7:00 PM, Ted Yu <[email protected]> wrote:
>>
>> > Both HFileOutputFormat and LoadIncrementalHFiles are in mapreduce
>> package.
>> >
>> > Cheers
>> >
>> > On Sun, Jan 13, 2013 at 1:31 AM, Bing Jiang <[email protected]
>> > >wrote:
>> >
>> > > hi,anoop.
>> > > Why not hbase mapreduce package contains the tools like this?
>> > >
>> > > Anoop John <[email protected]>编写：
>> > >
>> > > >Hi
>> > > >             Can you think of using HFileOutputFormat ?  Here you use
>> > > >TableOutputFormat now. There will be put calls to HTable. Instead in
>> > > >HFileOutput format the MR will write the HFiles directly.[No flushes
>> ,
>> > > >compactions] Later using LoadIncrementalHFiles need to load the
>> HFiles
>> > to
>> > > >the regions.  May help you..
>> > > >
>> > > >-Anoop-
>> > > >
>> > > >On Sun, Jan 13, 2013 at 10:59 AM, Farrokh Shahriari <
>> > > >[email protected]> wrote:
>> > > >
>> > > >> Thank you guys,let me change these configuration & test mapreduce
>> > again.
>> > > >>
>> > > >> On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika <
>> [email protected]>
>> > > >> wrote:
>> > > >>
>> > > >> > Start by testing HDFS throughput by doing s simple copyFromLocal
>> > using
>> > > >> > Hadoop command line shell (bin/hadoop fs -copyFromLocal
>> > pathTo8GBFile
>> > > >> > /tmp/dummyFile1). If you have 1000Mbit/sec network between the
>> > > computers,
>> > > >> > you should get around 75 MB/sec.
>> > > >> >
>> > > >> > On Tuesday, January 8, 2013, Bing Jiang wrote:
>> > > >> >
>> > > >> > > In our experience, it can enhance mapreduce insert by
>> > > >> > > 1.add regionserver flush thread number
>> > > >> > > 2.add memstore/jvm_heap
>> > > >> > > 3.pre split table region before mapreduce
>> > > >> > > 4.add large and small compaction thread number.
>> > > >> > >
>> > > >> > > please correct me if wrong, or any other better ideas.
>> > > >> > > On Jan 8, 2013 4:02 PM, "lars hofhansl" <[email protected]
>> > > >> <javascript:;>>
>> > > >> > > wrote:
>> > > >> > >
>> > > >> > > > What type of disks and how many?
>> > > >> > > > With the default replication factor your 2 (or 6) GB are
>> > actually
>> > > >> > > > replicated 3 times.
>> > > >> > > > 6GB/80s = 75MB/s, twice that if you do not disable the WAL,
>> > which
>> > > a
>> > > >> > > > reasonable machine should be able to absorb.
>> > > >> > > > The fact that deferred log flush does not help you seems to
>> > > indicate
>> > > >> > that
>> > > >> > > > you're over IO bound.
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > What's your memstore flush size? Potentially the data is
>> written
>> > > many
>> > > >> > > > times during compactions.
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > In your case you dial down the HDFS replication, since you
>> only
>> > > have
>> > > >> > two
>> > > >> > > > physical machines anyway.
>> > > >> > > > (Set it to 2. If you do not specify any failure zones, you
>> might
>> > > as
>> > > >> > well
>> > > >> > > > set it to 1... You will lose data if one of your server
>> machines
>> > > dies
>> > > >> > > > anyway).
>> > > >> > > >
>> > > >> > > > It does not really make that much sense to deploy HBase and
>> HDFS
>> > > on
>> > > >> > > > virtual nodes like this.
>> > > >> > > > -- Lars
>> > > >> > > >
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > ________________________________
>> > > >> > > >  From: Farrokh Shahriari <[email protected]
>> > > >> <javascript:;>>
>> > > >> > > > To: [email protected] <javascript:;>
>> > > >> > > > Sent: Monday, January 7, 2013 9:38 PM
>> > > >> > > > Subject: Re: Tune MapReduce over HBase to insert data
>> > > >> > > >
>> > > >> > > > Hi again,
>> > > >> > > > I'm using HBase 0.92.1-cdh4.0.0.
>> > > >> > > > I have two server machine with 48Gb RAM,12 physical core & 24
>> > > logical
>> > > >> > > core
>> > > >> > > > that contain 12 nodes(6 nodes on each server). Each node has
>> 8Gb
>> > > RAM
>> > > >> &
>> > > >> > 2
>> > > >> > > > VCPU.
>> > > >> > > > I've set some parameter that get better result like set
>> WAL=off
>> > on
>> > > >> > > put,but
>> > > >> > > > some parameters like Heap-size,Deferred log flush don't help
>> me.
>> > > >> > > > Beside that I have another question,why each time I've run
>> > > >> > mapreduce,I've
>> > > >> > > > got different result time while all the config & hardware are
>> > > same &
>> > > >> > not
>> > > >> > > > change ?
>> > > >> > > >
>> > > >> > > > Tnx you guys
>> > > >> > > >
>> > > >> > > > On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu <[email protected]
>> > > >> > <javascript:;>>
>> > > >> > > wrote:
>> > > >> > > >
>> > > >> > > > > Have you read through
>> > > >> http://hbase.apache.org/book.html#performance?
>> > > >> > > > >
>> > > >> > > > > What version of HBase are you using ?
>> > > >> > > > >
>> > > >> > > > > Cheers
>> > > >> > > > >
>> > > >> > > > > On Mon, Jan 7, 2013 at 9:05 PM, Farrokh Shahriari <
>> > > >> > > > > [email protected] <javascript:;>> wrote:
>> > > >> > > > >
>> > > >> > > > > > Hi there
>> > > >> > > > > > I have a cluster with 12 nodes that each of them has 2
>> core
>> > of
>> > > >> CPU.
>> > > >> > > > Now,I
>> > > >> > > > > > want insert large data about 2Gb in 80 sec ( or 6Gb in
>> > 240sec
>> > > ).
>> > > >> > I've
>> > > >> > > > > used
>> > > >> > > > > > Map-Reduce over hbase,but I can't achieve proper result .
>> > > >> > > > > > I'd be glad if you tell me what I can do to get better
>> > result
>> > > or
>> > > >> > > which
>> > > >> > > > > > parameters should I config or tune to improve
>> > Map-Reduce/Hbase
>> > > >> > > > > performance
>> > > >> > > > > > ?
>> > > >> > > > > >
>> > > >> > > > > > Tnx
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>>
>
>
>
> --
> Bing Jiang
> Tel：(86)134-2619-1361
> weibo: http://weibo.com/jiangbinglover
> BLOG: http://blog.sina.com.cn/jiangbinglover
> National Research Center for Intelligent Computing Systems
> Institute of Computing technology
> Graduate University of Chinese Academy of Science
>

Re: Tune MapReduce over HBase to insert data

Reply via email to