Re: Optimizations for random read performance

Ryan Rawson Tue, 16 Feb 2010 14:34:13 -0800

the hbase master pages displays aggregate cluster operation rate.

One thing about splits... they are data based, so if your hot data has a lot
of locality it can drive load unevenly. On a massive data set this is
usually mitigated by the size, but tests rarely demonstrate said data set
sizes.

On Feb 16, 2010 2:26 PM, "James Baldassari" <ja...@dataxu.com> wrote:

On Tue, 2010-02-16 at 14:05 -0600, Stack wrote:
> On Tue, Feb 16, 2010 at 10:50 AM, James Baldassari...
Yes, over time it would balance out, but we were trying to solve the
immediate problem of uneven load across the region servers, so we wanted
to make sure that all the data was evenly distributed to eliminate that
as a potential cause.  We did do the export and import, and then the
data was evenly distributed, so that's not the issue.

Whether the keys themselves are evenly distributed is another matter.
Our keys are user IDs, and they should be fairly random.  If we do a
status 'detailed' in the hbase shell we see the following distribution
for the value of "requests" (not entirely sure what this value means):
hdfs01: 7078
hdfs02: 5898
hdfs03: 5870
hdfs04: 3807

There are no order of magnitude differences here, and the request count
doesn't seem to map to the load on the server.  Right now hdfs02 has a
load of 16 while the 3 others have loads between 1 and 2.  Applying
HBASE-2180 did not make any measurable difference.  There are no errors
in the region server logs.  However, looking at the Hadoop datanode
logs, I'm seeing lots of these:

2010-02-16 17:07:54,064 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.24.183.165:50010,
storageID=DS-1519453437-10.24.183.165-50010-1265907617548, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException
       at java.io.DataInputStream.readShort(DataInputStream.java:298)
       at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:79)
       at java.lang.Thread.run(Thread.java:619)

One thing I should mention is that I'm not sure exactly how many
requests/second HBase is servicing now that I've switched to using HBase
client pools.  HBase may be servicing gets at a relatively high rate,
but it's just not keeping up with the incoming requests from our
clients.  I think there may be some additional client-side optimizations
we can make, so I might take some time to look at our client code to see
if there are any problems there.  However, I do think it's strange that
the load is so unbalanced on the region servers.

We're also going to try throwing some more hardware at the problem.
We'll set up a new cluster with 16-core, 16G nodes to see if they are
better able to handle the large number of client requests.  We might
also decrease the block size to 32k or lower.

>
> > We also increased the max heap size on the region server from 4G to 5G
> > and decreased the...
This is probably a topic for a separate thread, but I've never seen a
legal definition for the word "distribution."  How does this apply to
the SaaS model?

>
> >
> > We do have Ganglia with monitoring and alerts. We're not swapping right
> > now, althou...
I wish I knew the answer to this.  As I mentioned above, our keys are
user IDs.  They come in at random times, not in chunks of similar IDs.

-James

>
> Thanks James,
> St.Ack
>
>
> > -James
> >
> >
> > On Tue, 2010-02-16 at 12:17 -0600, Stack ...

Re: Optimizations for random read performance

Reply via email to