Re: Optimizations for random read performance

James Baldassari Wed, 17 Feb 2010 11:05:05 -0800

OK, I'll do my best to capture our changes here.  Ideally we would have
changed one variable at a time, but since these performance problems
were happening in our production environment, we finally had to just
throw the kitchen sink at it.  So I'm not sure which combination of the
following fixed the problems, but hopefully this will be useful
nonetheless:


- Upgraded Hadoop from 0.20 to 0.20.1 (Cloudera version
hadoop-0.20-0.20.1+169.56-1).  This version apparently has some fixes
for HDFS stability issues under load

- Applied the HBASE-2180 patch

- Decreased the hfile block size from 64k to 4k in the configuration
file, but we didn't alter our existing tables, so I'm not sure if this
change had any effect.

- Modified the Hadoop configuration to specify only one data directory
for HDFS and one Hadoop temp directory since our servers only have one
disk each.

- Threw lots of hardware at it.  We upgraded the region servers (which
are also our HDFS data nodes) from 8-core/8G boxes to 16-core/16G.  Our
final system configuration is as follows:
4x 16-core/16GB RAM/250GB SATAII HDFS data nodes / Hbase region servers
1x 4-core/8GB RAM/250GB (RAID 1) SATAII namenode / Hbase master
1x 4-core/8GB RAM/250GB (RAID 1) SATAII secondary namenode

- Increased region server max heap from 5G to 12G

I think that's everything.  If I had to guess, I would say that
upgrading Hadoop and moving to bigger hardware with more heap space for
HBase was what did it.  If anyone wants more details, like some specific
config setting, let me know and I'll try to get that for you.

HBase is having no problems keeping up with all the gets/puts now.  The
load on the region servers is evenly distributed and is very low (< 1).

Thanks again to everyone who helped me work through these issues.  I
really appreciate it.

-James


On Wed, 2010-02-17 at 02:18 -0600, Daniel Washusen wrote:
> Glad you sorted it out!  Please do tell...
> 
> On 17/02/2010, at 4:59 PM, James Baldassari <ja...@dataxu.com> wrote:
> 
> > Hi,
> >
> > I think we managed to solve our performance and load issues.
> > Everything
> > has been stable for about an hour now, but I'm not going to raise the
> > victory flag until the morning because we've had short periods of
> > stability in the past.
> >
> > I've been working on this problem non-stop for almost a week now, so I
> > really need to get some sleep, but if everything looks good tomorrow
> > I'll write up a summary of all the changes we made and share it with
> > the
> > group.  Hopefully this exercise in tuning for a high-throughput
> > real-time environment will be useful to others.
> >
> > Thanks,
> > James
> >
> >
> > On Tue, 2010-02-16 at 23:18 -0600, Stack wrote:
> >> When you look at top on the loaded server is it the regionserver or
> >> the datanode that is using up the cpu?
> >>
> >> I look at your hdfs listing.  Some of the regions have 3 and 4 files
> >> but most look fine.   A good few are on the compaction verge so I'd
> >> imagine a lot of compaction going on but this is background though it
> >> does suck cpu and i/o... it shouldn't be too bad.
> >>
> >> I took a look at the regionserver log.  The server is struggling
> >> during which time period?  There is one log run at the start and
> >> there
> >> it seems like nothing untoward.  Please enable DEBUG going forward.
> >> It'll shed more light on whats going on: See
> >> http://wiki.apache.org/hadoop/Hbase/FAQ#A5 for how.  Otherwise, the
> >> log doesn't have anything  running long enough for it to have been
> >> under serious load.
> >>
> >> This is a four node cluster now?  You don't seem to have too many
> >> regions per server yet you have a pretty high read/write rate going
> >> by
> >> earlier requests postings.   Maybe you need to add more servers.  Are
> >> you going to add in those 16G machines?
> >>
> >> When you look at the master ui, you can see that the request rate
> >> over
> >> time is about the same for all regionservers?  (refresh the master ui
> >> every so often to take a new sampling).
> >>
> >> St.Ack
> >>
> >>
> >>
> >>
> >> On Tue, Feb 16, 2010 at 3:59 PM, James Baldassari
> >> <ja...@dataxu.com> wrote:
> >>> Nope.  We don't do any map reduce.  We're only using Hadoop for
> >>> HBase at
> >>> the moment.
> >>>
> >>> That one node, hdfs02, still has a load of 16 with around 40% I/O
> >>> and
> >>> 120% CPU.  The other nodes are all around 66% CPU with 0-1% I/O
> >>> and load
> >>> of 1 to 3.
> >>>
> >>> I don't think all the requests are going to hdfs02 based on the
> >>> status
> >>> 'detailed' output.  It seems like that node is just having a much
> >>> harder
> >>> time getting the data or something.  Maybe we have some incorrect
> >>> HDFS
> >>> setting.  All the configs are identical, though.
> >>>
> >>> -James
> >>>
> >>>
> >>> On Tue, 2010-02-16 at 17:45 -0600, Dan Washusen wrote:
> >>>> You mentioned in a previous email that you have a Task Tracker
> >>>> process
> >>>> running on each of the nodes.  Is there any chance there is a map
> >>>> reduce job
> >>>> running?
> >>>>
> >>>> On 17 February 2010 10:31, James Baldassari <ja...@dataxu.com>
> >>>> wrote:
> >>>>
> >>>>> On Tue, 2010-02-16 at 16:45 -0600, Stack wrote:
> >>>>>> On Tue, Feb 16, 2010 at 2:25 PM, James Baldassari <ja...@dataxu.com
> >>>>>> >
> >>>>> wrote:
> >>>>>>> On Tue, 2010-02-16 at 14:05 -0600, Stack wrote:
> >>>>>>>> On Tue, Feb 16, 2010 at 10:50 AM, James Baldassari <ja...@dataxu.com
> >>>>>>>> >
> >>>>> wrote:
> >>>>>>>
> >>>>>>> Whether the keys themselves are evenly distributed is another
> >>>>>>> matter.
> >>>>>>> Our keys are user IDs, and they should be fairly random.  If
> >>>>>>> we do a
> >>>>>>> status 'detailed' in the hbase shell we see the following
> >>>>>>> distribution
> >>>>>>> for the value of "requests" (not entirely sure what this value
> >>>>>>> means):
> >>>>>>> hdfs01: 7078
> >>>>>>> hdfs02: 5898
> >>>>>>> hdfs03: 5870
> >>>>>>> hdfs04: 3807
> >>>>>>>
> >>>>>> That looks like they are evenly distributed.  Requests are how
> >>>>>> many
> >>>>>> hits a second.  See the UI on master port 60010.  The numbers
> >>>>>> should
> >>>>>> match.
> >>>>>
> >>>>> So the total across all 4 region servers would be 22,653/
> >>>>> second?  Hmm,
> >>>>> that doesn't seem too bad.  I guess we just need a little more
> >>>>> throughput...
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>> There are no order of magnitude differences here, and the
> >>>>>>> request count
> >>>>>>> doesn't seem to map to the load on the server.  Right now
> >>>>>>> hdfs02 has a
> >>>>>>> load of 16 while the 3 others have loads between 1 and 2.
> >>>>>>
> >>>>>>
> >>>>>> This is interesting.  I went back over your dumps of cache
> >>>>>> stats above
> >>>>>> and the 'loaded' servers didn't have any attribute there that
> >>>>>> differentiated it from others.  For example, the number of
> >>>>>> storefiles
> >>>>>> seemed about same.
> >>>>>>
> >>>>>> I wonder what is making for the high load?  Can you figure it?
> >>>>>> Is it
> >>>>>> high CPU use (unlikely).  Is it then high i/o?  Can you try and
> >>>>>> figure
> >>>>>> whats different about the layout under the loaded server and
> >>>>>> that of
> >>>>>> an unloaded server?  Maybe do a ./bin/hadoop fs -lsr /hbase and
> >>>>>> see if
> >>>>>> anything jumps out at you.
> >>>>>
> >>>>> It's I/O wait that is killing the highly loaded server.  The CPU
> >>>>> usage
> >>>>> reported by top is just about the same across all servers
> >>>>> (around 100%
> >>>>> on an 8-core node), but one server at any given time has a much
> >>>>> higher
> >>>>> load due to I/O.
> >>>>>
> >>>>>>
> >>>>>> If you want to post the above or a loaded servers log to
> >>>>>> pastbin we'll
> >>>>>> take a looksee.
> >>>>>
> >>>>> I'm not really sure what to look for, but maybe someone else
> >>>>> will notice
> >>>>> something, so here's the output of hadoop fs -lsr /hbase:
> >>>>> http://pastebin.com/m98096de
> >>>>>
> >>>>> And here is today's region server log from hdfs02, which seems
> >>>>> to get
> >>>>> hit particularly hard: http://pastebin.com/m1d8a1e5f
> >>>>>
> >>>>> Please note that we restarted it several times today, so some of
> >>>>> those
> >>>>> errors are probably just due to restarting the region server.
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> Applying
> >>>>>>> HBASE-2180 did not make any measurable difference.  There are
> >>>>>>> no errors
> >>>>>>> in the region server logs.  However, looking at the Hadoop
> >>>>>>> datanode
> >>>>>>> logs, I'm seeing lots of these:
> >>>>>>>
> >>>>>>> 2010-02-16 17:07:54,064 ERROR
> >>>>> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >>>>> DatanodeRegistration(
> >>>>> 10.24.183.165:50010,
> >>>>> storageID=DS-1519453437-10.24.183.165-50010-1265907617548,
> >>>>> infoPort=50075,
> >>>>> ipcPort=50020):DataXceiver
> >>>>>>> java.io.EOFException
> >>>>>>>       at java.io.DataInputStream.readShort
> >>>>>>> (DataInputStream.java:298)
> >>>>>>>       at
> >>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run
> >>>>> (DataXceiver.java:79)
> >>>>>>>       at java.lang.Thread.run(Thread.java:619)
> >>>>>>
> >>>>>> You upped xceivers on your hdfs cluster?  If you look at
> >>>>>> otherend of
> >>>>>> the above EOFE, can you see why it died?
> >>>>>
> >>>>> Max xceivers = 3072; datanode handler count = 20; region server
> >>>>> handler
> >>>>> count = 100
> >>>>>
> >>>>> I can't find the other end of the EOFException.  I looked in the
> >>>>> Hadoop
> >>>>> and HBase logs on the server that is the name node and HBase
> >>>>> master, as
> >>>>> well as the on HBase client.
> >>>>>
> >>>>> Thanks for all the help!
> >>>>>
> >>>>> -James
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> However, I do think it's strange that
> >>>>>>> the load is so unbalanced on the region servers.
> >>>>>>>
> >>>>>>
> >>>>>> I agree.
> >>>>>>
> >>>>>>
> >>>>>>> We're also going to try throwing some more hardware at the
> >>>>>>> problem.
> >>>>>>> We'll set up a new cluster with 16-core, 16G nodes to see if
> >>>>>>> they are
> >>>>>>> better able to handle the large number of client requests.  We
> >>>>>>> might
> >>>>>>> also decrease the block size to 32k or lower.
> >>>>>>>
> >>>>>> Ok.
> >>>>>>
> >>>>>>>> Should only be a matter if you intend distributing the above.
> >>>>>>>
> >>>>>>> This is probably a topic for a separate thread, but I've never
> >>>>>>> seen a
> >>>>>>> legal definition for the word "distribution."  How does this
> >>>>>>> apply to
> >>>>>>> the SaaS model?
> >>>>>>>
> >>>>>> Fair enough.
> >>>>>>
> >>>>>> Something is up.  Especially if hbase-2180 made no difference.
> >>>>>>
> >>>>>> St.Ack
> >>>>>
> >>>>>
> >>>
> >>>
> >

Re: Optimizations for random read performance

Reply via email to