RE: Hbase / Hadoop Tuning

Jonathan Gray Thu, 02 Oct 2008 13:37:28 -0700

In this case, it would definitely hurt your performance.

One question.  Have you done more detailed timings to determine where time
is spent?  With the overhead of your webapp, and it streaming insertions one
row at a time, is it possible that a significant amount of time is being
spent before or after the hbase commit (significant in this case could be
1-2 ms/row).


JG

-----Original Message-----
From: Slava Gorelik [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 02, 2008 1:12 PM
To: [email protected]
Subject: Re: Hbase / Hadoop Tuning

Thank You.
According to doing write in MR jobs, the problem is that rows are coming to
webapp one by one and i can't accumulate them into
one big batch update, it means i need to run MR job for each single row, in
this case will MR jobs help ?

Best Regards.

On Thu, Oct 2, 2008 at 10:58 PM, Jim Kellerman (POWERSET) <
[EMAIL PROTECTED]> wrote:

> Responses inline below.
> > -----Original Message-----
> > From: Slava Gorelik [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, October 02, 2008 12:39 PM
> > To: [email protected]
> > Subject: Re: Hbase / Hadoop Tuning
> >
> > Thank You Jim for a quick answer.
> > 1) If i understand correct, using 2 clients should allow me improve
> > the performance twice (more or less) ?
>
> I don't know if you will get 2x performance, but it will be greater than
> 1x.
>
> > 2) Currently, our webapp is HBase client using Htable - is that what you
> > meant, when you said "(HBase, not web) clients" ?
>
> If multiple requests come into your webapp, and your webapp is
> multithreaded, you will not see a performance increase.
>
> If your webapp runs a different process for each request, you will see
> a performance increase because the RPC connection will not be shared
> and consequently will not block on the giant lock. That is why I
> recommended splitting up your job using Map/Reduce.
>
> > 3) 64MB for single region server is a minimum size or could be less ?
>
> It could be less, but that is the default block size for the Hadoop DFS.
> If you make it smaller, you might want to change the default block size
> for Hadoop as well.
>
> > 4) When is planed to fix the RPC lock for concurrent operations
> > in single client ?
>
> This change is targeted for somewhere in the next 6 months according
> to the roadmap.
>
>
> > Thank You Again and Best Regards.
> >
> >
> > On Thu, Oct 2, 2008 at 10:30 PM, Jim Kellerman (POWERSET) <
> > [EMAIL PROTECTED]> wrote:
> >
> > > What you are storing is 140,000,000 bytes, so having multiple
> > > region servers will not help you as a single region is only
> > > served by a single region server. By default, regions split
> > > when they reach 256MB. So until the region splits, all traffic
> > > will go to a single region server. You might try reducing the
> > > maximum file size to encourage region splitting by changing the
> > > value of hbase.hregion.max.filesize to 64MB.
> > >
> > > Using a single client will also limit write performance.
> > > Even if the client is multi-threaded, there is a big giant lock
> > > in the RPC mechanism which prevents concurrent requests (This
> > > is something we plan to fix in the future).
> > >
> > > Multiple clients do not block against one another the way multi-
> > > threaded clients do currently. So another way to increase
> > > write performance would be to run multiple (HBase, not web) clients,
> > > by either running multiple processes directly, or by utilizing
> > > a Map/Reduce job to do the writes.
> > >
> > > ---
> > > Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> > >
> > >
> > > > -----Original Message-----
> > > > From: Slava Gorelik [mailto:[EMAIL PROTECTED]
> > > > Sent: Thursday, October 02, 2008 12:07 PM
> > > > To: [email protected]
> > > > Subject: Re: Hbase / Hadoop Tuning
> > > >
> > > > Hi.Thank you for quick response.
> > > > We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise 10).
> > > > Each machine is : 4 CPU with 4gb ram and 200gb HD, connected with
1gb
> > > > network interface.
> > > > All machines in the same rec. On one machine (master) we are running
> > > > Tomcat
> > > > with one webapp
> > > > that is adding 100000 rows. Nothing else is running. When no webapp
> > > > running
> > > > the CPU load is less the 1%.
> > > >
> > > > We are using Hbase 0.18.0 and Hadoop 0.18.0.
> > > > Hbase cluster is one master and 6 region servers.
> > > >
> > > > Row addition is done by BatchUpdate and commint into single column
> > > family.
> > > > The data is simple bytes array (1400 bytes each row).
> > > >
> > > >
> > > > Thank You and Best Regards.
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Oct 2, 2008 at 9:39 PM, stack <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Tell us more Slava.  HBase versions and how many regions you have
> in
> > > > your
> > > > > cluster?
> > > > >
> > > > > If small rows, your best boost will likely come when we support
> > > batching
> > > > of
> > > > > updates: HBASE-748.
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > > Slava Gorelik wrote:
> > > > >
> > > > >> Hi All.
> > > > >> Our environment - 8 Datanodes (1 is also Namenode),
> > > > >> 7 from them is also region servers and 1 is Master, default
> > > replication
> > > > -
> > > > >> 3.
> > > > >> We have application that heavy writes with relative small rows -
> > about
> > > > >> 10Kb,
> > > > >> current performance is 100000 rows in 580000 Milisec - 5.8
Milisec
> > /
> > > > row.
> > > > >> Is there any way to improve this performance by some tuning /
> > tweaking
> > > > >> HBase
> > > > >> or Hadoop ?
> > > > >>
> > > > >> Thank You and Best Regards.
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > >
>

RE: Hbase / Hadoop Tuning

Reply via email to