Re: Optimizations for random read performance

James Baldassari Thu, 18 Feb 2010 07:40:15 -0800

On Wed, 2010-02-17 at 13:31 -0600, Stack wrote:
> On Wed, Feb 17, 2010 at 11:04 AM, James Baldassari <ja...@dataxu.com> wrote:
> > OK, I'll do my best to capture our changes here.  Ideally we would have
> > changed one variable at a time, but since these performance problems
> > were happening in our production environment, we finally had to just
> > throw the kitchen sink at it.
> 
> Been there.
> 
> 
>  So I'm not sure which combination of the
> > following fixed the problems, but hopefully this will be useful
> > nonetheless:
> >
> > - Upgraded Hadoop from 0.20 to 0.20.1 (Cloudera version
> > hadoop-0.20-0.20.1+169.56-1).  This version apparently has some fixes
> > for HDFS stability issues under load
> >
> 
> I took a look in here
> http://archive.cloudera.com/cdh/2/hadoop-0.20.1+169.56.CHANGES.txt and
> nothing obvious jumps out.  I'm asking the cloudera lads if they think
> there is anything that could have made a difference.
> 
> > - Applied the HBASE-2180 patch
> >
> 
> I'd think this would have made a big difference (though you reported
> in earlier mail it had no effect).
> 
> > - Decreased the hfile block size from 64k to 4k in the configuration
> > file, but we didn't alter our existing tables, so I'm not sure if this
> > change had any effect.
> 
> Did you restart after making this change?  If so, the cluster picked
> it up.  You can verify by taking a look at a random hfile.  Do
> something like:
> 
> ./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -m PATH_TO_FILE_IN_HDFS
> 
> This'll dump out meta stats on the chosen file part of which will be
> block size used writing the file.


Yes, we did restart after setting the block size to 4k in the config.
The reason why we were not sure whether the config change took effect is
this output from hbase shell:

hbase(main):005:0> describe 'users'
DESCRIPTION                                                             ENABLED 
                              
 {NAME => 'users', FAMILIES => [{NAME => 'data', VERSIONS => '1', COMPR true    
                              
 ESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY         
                              
  => 'true', BLOCKCACHE => 'true'}]}

This indicates that the blocksize is 64k.

> 
> >
> > - Modified the Hadoop configuration to specify only one data directory
> > for HDFS and one Hadoop temp directory since our servers only have one
> > disk each.
> >
> 
> I'd think this'd make for no difference in perf.
> 
> 
> > - Threw lots of hardware at it.  We upgraded the region servers (which
> > are also our HDFS data nodes) from 8-core/8G boxes to 16-core/16G.  Our
> > final system configuration is as follows:
> > 4x 16-core/16GB RAM/250GB SATAII HDFS data nodes / Hbase region servers
> > 1x 4-core/8GB RAM/250GB (RAID 1) SATAII namenode / Hbase master
> > 1x 4-core/8GB RAM/250GB (RAID 1) SATAII secondary namenode
> >
> 
> This probably did it (smile).
> 
> > - Increased region server max heap from 5G to 12G
> >
> 
> So, all is in cache?  Is that what did fixed the issue?

I don't think everything is in cache.  We have several gigabytes of
data.  However, I'm sure that having more of it in cache doesn't hurt :)

> 
> Want to send us regionserver logs?

The only messages in the region server logs are the periodic
notifications about performing compaction and rolling hlogs.  Is there
something specific you are interested in seeing?  I don't mind sending
out the logs.  I just don't think there is anything interesting in
there.

> 
> Whats the request rate you are servicing?  Whats it say on the master home 
> page?

A quick sample shows around 8,000 requests/second.  Our peak is probably
higher than that.  I do remember that when we were on the old hardware
and the clients were backed up with requests (and therefore sending
large batches as quickly as possible), the master showed around
22,000/sec.  I think we have some room to grow.  Now that we have this
situation under control in production, we'll probably do some additional
load testing in our staging environment to get a better sense of the
limits.  When we get this done I'll send out the results.  Maybe some of
this info should go on the HBase wiki as well.

> 
> > I think that's everything.  If I had to guess, I would say that
> > upgrading Hadoop and moving to bigger hardware with more heap space for
> > HBase was what did it.  If anyone wants more details, like some specific
> > config setting, let me know and I'll try to get that for you.
> >
> > HBase is having no problems keeping up with all the gets/puts now.  The
> > load on the region servers is evenly distributed and is very low (< 1).
> >
> 
> Whats the utilization on these boxes?  Any cpu or i/o load?

The load averages are typically less than 1.  Top reports around 50% CPU
utilization and 3-5G resident set size for the region server process.
The Hadoop data node process isn't using much CPU or memory at all.

> 
> > Thanks again to everyone who helped me work through these issues.  I
> > really appreciate it.
> >
> Thanks for sticking with it.  Sorry for the loss of sleep.
> St.Ack
> 
> 
> >
> > On Wed, 2010-02-17 at 02:18 -0600, Daniel Washusen wrote:
> >> Glad you sorted it out!  Please do tell...
> >>
> >> On 17/02/2010, at 4:59 PM, James Baldassari <ja...@dataxu.com> wrote:
> >>
> >> > Hi,
> >> >
> >> > I think we managed to solve our performance and load issues.
> >> > Everything
> >> > has been stable for about an hour now, but I'm not going to raise the
> >> > victory flag until the morning because we've had short periods of
> >> > stability in the past.
> >> >
> >> > I've been working on this problem non-stop for almost a week now, so I
> >> > really need to get some sleep, but if everything looks good tomorrow
> >> > I'll write up a summary of all the changes we made and share it with
> >> > the
> >> > group.  Hopefully this exercise in tuning for a high-throughput
> >> > real-time environment will be useful to others.
> >> >
> >> > Thanks,
> >> > James
> >> >
> >> >
> >> > On Tue, 2010-02-16 at 23:18 -0600, Stack wrote:
> >> >> When you look at top on the loaded server is it the regionserver or
> >> >> the datanode that is using up the cpu?
> >> >>
> >> >> I look at your hdfs listing.  Some of the regions have 3 and 4 files
> >> >> but most look fine.   A good few are on the compaction verge so I'd
> >> >> imagine a lot of compaction going on but this is background though it
> >> >> does suck cpu and i/o... it shouldn't be too bad.
> >> >>
> >> >> I took a look at the regionserver log.  The server is struggling
> >> >> during which time period?  There is one log run at the start and
> >> >> there
> >> >> it seems like nothing untoward.  Please enable DEBUG going forward.
> >> >> It'll shed more light on whats going on: See
> >> >> http://wiki.apache.org/hadoop/Hbase/FAQ#A5 for how.  Otherwise, the
> >> >> log doesn't have anything  running long enough for it to have been
> >> >> under serious load.
> >> >>
> >> >> This is a four node cluster now?  You don't seem to have too many
> >> >> regions per server yet you have a pretty high read/write rate going
> >> >> by
> >> >> earlier requests postings.   Maybe you need to add more servers.  Are
> >> >> you going to add in those 16G machines?
> >> >>
> >> >> When you look at the master ui, you can see that the request rate
> >> >> over
> >> >> time is about the same for all regionservers?  (refresh the master ui
> >> >> every so often to take a new sampling).
> >> >>
> >> >> St.Ack
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Feb 16, 2010 at 3:59 PM, James Baldassari
> >> >> <ja...@dataxu.com> wrote:
> >> >>> Nope.  We don't do any map reduce.  We're only using Hadoop for
> >> >>> HBase at
> >> >>> the moment.
> >> >>>
> >> >>> That one node, hdfs02, still has a load of 16 with around 40% I/O
> >> >>> and
> >> >>> 120% CPU.  The other nodes are all around 66% CPU with 0-1% I/O
> >> >>> and load
> >> >>> of 1 to 3.
> >> >>>
> >> >>> I don't think all the requests are going to hdfs02 based on the
> >> >>> status
> >> >>> 'detailed' output.  It seems like that node is just having a much
> >> >>> harder
> >> >>> time getting the data or something.  Maybe we have some incorrect
> >> >>> HDFS
> >> >>> setting.  All the configs are identical, though.
> >> >>>
> >> >>> -James
> >> >>>
> >> >>>
> >> >>> On Tue, 2010-02-16 at 17:45 -0600, Dan Washusen wrote:
> >> >>>> You mentioned in a previous email that you have a Task Tracker
> >> >>>> process
> >> >>>> running on each of the nodes.  Is there any chance there is a map
> >> >>>> reduce job
> >> >>>> running?
> >> >>>>
> >> >>>> On 17 February 2010 10:31, James Baldassari <ja...@dataxu.com>
> >> >>>> wrote:
> >> >>>>
> >> >>>>> On Tue, 2010-02-16 at 16:45 -0600, Stack wrote:
> >> >>>>>> On Tue, Feb 16, 2010 at 2:25 PM, James Baldassari <ja...@dataxu.com
> >> >>>>>> >
> >> >>>>> wrote:
> >> >>>>>>> On Tue, 2010-02-16 at 14:05 -0600, Stack wrote:
> >> >>>>>>>> On Tue, Feb 16, 2010 at 10:50 AM, James Baldassari 
> >> >>>>>>>> <ja...@dataxu.com
> >> >>>>>>>> >
> >> >>>>> wrote:
> >> >>>>>>>
> >> >>>>>>> Whether the keys themselves are evenly distributed is another
> >> >>>>>>> matter.
> >> >>>>>>> Our keys are user IDs, and they should be fairly random.  If
> >> >>>>>>> we do a
> >> >>>>>>> status 'detailed' in the hbase shell we see the following
> >> >>>>>>> distribution
> >> >>>>>>> for the value of "requests" (not entirely sure what this value
> >> >>>>>>> means):
> >> >>>>>>> hdfs01: 7078
> >> >>>>>>> hdfs02: 5898
> >> >>>>>>> hdfs03: 5870
> >> >>>>>>> hdfs04: 3807
> >> >>>>>>>
> >> >>>>>> That looks like they are evenly distributed.  Requests are how
> >> >>>>>> many
> >> >>>>>> hits a second.  See the UI on master port 60010.  The numbers
> >> >>>>>> should
> >> >>>>>> match.
> >> >>>>>
> >> >>>>> So the total across all 4 region servers would be 22,653/
> >> >>>>> second?  Hmm,
> >> >>>>> that doesn't seem too bad.  I guess we just need a little more
> >> >>>>> throughput...
> >> >>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>> There are no order of magnitude differences here, and the
> >> >>>>>>> request count
> >> >>>>>>> doesn't seem to map to the load on the server.  Right now
> >> >>>>>>> hdfs02 has a
> >> >>>>>>> load of 16 while the 3 others have loads between 1 and 2.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> This is interesting.  I went back over your dumps of cache
> >> >>>>>> stats above
> >> >>>>>> and the 'loaded' servers didn't have any attribute there that
> >> >>>>>> differentiated it from others.  For example, the number of
> >> >>>>>> storefiles
> >> >>>>>> seemed about same.
> >> >>>>>>
> >> >>>>>> I wonder what is making for the high load?  Can you figure it?
> >> >>>>>> Is it
> >> >>>>>> high CPU use (unlikely).  Is it then high i/o?  Can you try and
> >> >>>>>> figure
> >> >>>>>> whats different about the layout under the loaded server and
> >> >>>>>> that of
> >> >>>>>> an unloaded server?  Maybe do a ./bin/hadoop fs -lsr /hbase and
> >> >>>>>> see if
> >> >>>>>> anything jumps out at you.
> >> >>>>>
> >> >>>>> It's I/O wait that is killing the highly loaded server.  The CPU
> >> >>>>> usage
> >> >>>>> reported by top is just about the same across all servers
> >> >>>>> (around 100%
> >> >>>>> on an 8-core node), but one server at any given time has a much
> >> >>>>> higher
> >> >>>>> load due to I/O.
> >> >>>>>
> >> >>>>>>
> >> >>>>>> If you want to post the above or a loaded servers log to
> >> >>>>>> pastbin we'll
> >> >>>>>> take a looksee.
> >> >>>>>
> >> >>>>> I'm not really sure what to look for, but maybe someone else
> >> >>>>> will notice
> >> >>>>> something, so here's the output of hadoop fs -lsr /hbase:
> >> >>>>> http://pastebin.com/m98096de
> >> >>>>>
> >> >>>>> And here is today's region server log from hdfs02, which seems
> >> >>>>> to get
> >> >>>>> hit particularly hard: http://pastebin.com/m1d8a1e5f
> >> >>>>>
> >> >>>>> Please note that we restarted it several times today, so some of
> >> >>>>> those
> >> >>>>> errors are probably just due to restarting the region server.
> >> >>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Applying
> >> >>>>>>> HBASE-2180 did not make any measurable difference.  There are
> >> >>>>>>> no errors
> >> >>>>>>> in the region server logs.  However, looking at the Hadoop
> >> >>>>>>> datanode
> >> >>>>>>> logs, I'm seeing lots of these:
> >> >>>>>>>
> >> >>>>>>> 2010-02-16 17:07:54,064 ERROR
> >> >>>>> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> >>>>> DatanodeRegistration(
> >> >>>>> 10.24.183.165:50010,
> >> >>>>> storageID=DS-1519453437-10.24.183.165-50010-1265907617548,
> >> >>>>> infoPort=50075,
> >> >>>>> ipcPort=50020):DataXceiver
> >> >>>>>>> java.io.EOFException
> >> >>>>>>>       at java.io.DataInputStream.readShort
> >> >>>>>>> (DataInputStream.java:298)
> >> >>>>>>>       at
> >> >>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run
> >> >>>>> (DataXceiver.java:79)
> >> >>>>>>>       at java.lang.Thread.run(Thread.java:619)
> >> >>>>>>
> >> >>>>>> You upped xceivers on your hdfs cluster?  If you look at
> >> >>>>>> otherend of
> >> >>>>>> the above EOFE, can you see why it died?
> >> >>>>>
> >> >>>>> Max xceivers = 3072; datanode handler count = 20; region server
> >> >>>>> handler
> >> >>>>> count = 100
> >> >>>>>
> >> >>>>> I can't find the other end of the EOFException.  I looked in the
> >> >>>>> Hadoop
> >> >>>>> and HBase logs on the server that is the name node and HBase
> >> >>>>> master, as
> >> >>>>> well as the on HBase client.
> >> >>>>>
> >> >>>>> Thanks for all the help!
> >> >>>>>
> >> >>>>> -James
> >> >>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>>
> >> >>>>>>> However, I do think it's strange that
> >> >>>>>>> the load is so unbalanced on the region servers.
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>> I agree.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>> We're also going to try throwing some more hardware at the
> >> >>>>>>> problem.
> >> >>>>>>> We'll set up a new cluster with 16-core, 16G nodes to see if
> >> >>>>>>> they are
> >> >>>>>>> better able to handle the large number of client requests.  We
> >> >>>>>>> might
> >> >>>>>>> also decrease the block size to 32k or lower.
> >> >>>>>>>
> >> >>>>>> Ok.
> >> >>>>>>
> >> >>>>>>>> Should only be a matter if you intend distributing the above.
> >> >>>>>>>
> >> >>>>>>> This is probably a topic for a separate thread, but I've never
> >> >>>>>>> seen a
> >> >>>>>>> legal definition for the word "distribution."  How does this
> >> >>>>>>> apply to
> >> >>>>>>> the SaaS model?
> >> >>>>>>>
> >> >>>>>> Fair enough.
> >> >>>>>>
> >> >>>>>> Something is up.  Especially if hbase-2180 made no difference.
> >> >>>>>>
> >> >>>>>> St.Ack
> >> >>>>>
> >> >>>>>
> >> >>>
> >> >>>
> >> >
> >
> >

Re: Optimizations for random read performance

Reply via email to