Re: Optimizations for random read performance

James Baldassari Thu, 18 Feb 2010 07:53:23 -0800

On Wed, 2010-02-17 at 15:26 -0600, Dan Washusen wrote:
> With a 12gb heap you now have 2.4GB for the block cache verses your original
> 800mb.  Out of interest, what are your block cache stats now?  Are they
> above the 72% you were seeing previously?  Now that you have the resources,
> it might also be worth while increasing the block cache from 0.2 to
> something larger (like 0.4).


Here are the block cache hit ratio stats:
Region server 1: 99
Region server 2: 99
Region server 3: 99
Region server 4: 99

> 
> You also have more memory available for the OS level file system cache so
> that probably helps.  Before adding more RAM you had pretty much all
> available memory allocated to processes and nothing in the file system cache
> (see the buffers section under Mem in top)...  Would you be able to send us
> through the top output for the machines?

Sure.  Here's 'top | head' for all region servers:
Region server 1: http://pastebin.com/m45e0bc2
Region server 2: http://pastebin.com/m31868bb6
Region server 3: http://pastebin.com/m5b1ce71f
Region server 4: http://pastebin.com/mae7326f

> 
> Another random thaught;  if you are not running map reduce task then you
> could shut down the Job Trackers and put that memory to better use...

Thank you!  We were wondering if we could do that.  We thought HBase
might kick off some M/R jobs to do maintenance tasks or something, but
it's good to know that we can safely shut the M/R stuff down.

> 
> Anyway, thanks for sharing!
> 
> Cheers,
> Dan
> 
> On 18 February 2010 06:31, Stack <st...@duboce.net> wrote:
> 
> > On Wed, Feb 17, 2010 at 11:04 AM, James Baldassari <ja...@dataxu.com>
> > wrote:
> > > OK, I'll do my best to capture our changes here.  Ideally we would have
> > > changed one variable at a time, but since these performance problems
> > > were happening in our production environment, we finally had to just
> > > throw the kitchen sink at it.
> >
> > Been there.
> >
> >
> >  So I'm not sure which combination of the
> > > following fixed the problems, but hopefully this will be useful
> > > nonetheless:
> > >
> > > - Upgraded Hadoop from 0.20 to 0.20.1 (Cloudera version
> > > hadoop-0.20-0.20.1+169.56-1).  This version apparently has some fixes
> > > for HDFS stability issues under load
> > >
> >
> > I took a look in here
> > http://archive.cloudera.com/cdh/2/hadoop-0.20.1+169.56.CHANGES.txt and
> > nothing obvious jumps out.  I'm asking the cloudera lads if they think
> > there is anything that could have made a difference.
> >
> > > - Applied the HBASE-2180 patch
> > >
> >
> > I'd think this would have made a big difference (though you reported
> > in earlier mail it had no effect).
> >
> > > - Decreased the hfile block size from 64k to 4k in the configuration
> > > file, but we didn't alter our existing tables, so I'm not sure if this
> > > change had any effect.
> >
> > Did you restart after making this change?  If so, the cluster picked
> > it up.  You can verify by taking a look at a random hfile.  Do
> > something like:
> >
> > ./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -m PATH_TO_FILE_IN_HDFS
> >
> > This'll dump out meta stats on the chosen file part of which will be
> > block size used writing the file.
> >
> > >
> > > - Modified the Hadoop configuration to specify only one data directory
> > > for HDFS and one Hadoop temp directory since our servers only have one
> > > disk each.
> > >
> >
> > I'd think this'd make for no difference in perf.
> >
> >
> > > - Threw lots of hardware at it.  We upgraded the region servers (which
> > > are also our HDFS data nodes) from 8-core/8G boxes to 16-core/16G.  Our
> > > final system configuration is as follows:
> > > 4x 16-core/16GB RAM/250GB SATAII HDFS data nodes / Hbase region servers
> > > 1x 4-core/8GB RAM/250GB (RAID 1) SATAII namenode / Hbase master
> > > 1x 4-core/8GB RAM/250GB (RAID 1) SATAII secondary namenode
> > >
> >
> > This probably did it (smile).
> >
> > > - Increased region server max heap from 5G to 12G
> > >
> >
> > So, all is in cache?  Is that what did fixed the issue?
> >
> > Want to send us regionserver logs?
> >
> > Whats the request rate you are servicing?  Whats it say on the master home
> > page?
> >
> > > I think that's everything.  If I had to guess, I would say that
> > > upgrading Hadoop and moving to bigger hardware with more heap space for
> > > HBase was what did it.  If anyone wants more details, like some specific
> > > config setting, let me know and I'll try to get that for you.
> > >
> > > HBase is having no problems keeping up with all the gets/puts now.  The
> > > load on the region servers is evenly distributed and is very low (< 1).
> > >
> >
> > Whats the utilization on these boxes?  Any cpu or i/o load?
> >
> > > Thanks again to everyone who helped me work through these issues.  I
> > > really appreciate it.
> > >
> > Thanks for sticking with it.  Sorry for the loss of sleep.
> > St.Ack
> >
> >
> > >
> > > On Wed, 2010-02-17 at 02:18 -0600, Daniel Washusen wrote:
> > >> Glad you sorted it out!  Please do tell...
> > >>
> > >> On 17/02/2010, at 4:59 PM, James Baldassari <ja...@dataxu.com> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I think we managed to solve our performance and load issues.
> > >> > Everything
> > >> > has been stable for about an hour now, but I'm not going to raise the
> > >> > victory flag until the morning because we've had short periods of
> > >> > stability in the past.
> > >> >
> > >> > I've been working on this problem non-stop for almost a week now, so I
> > >> > really need to get some sleep, but if everything looks good tomorrow
> > >> > I'll write up a summary of all the changes we made and share it with
> > >> > the
> > >> > group.  Hopefully this exercise in tuning for a high-throughput
> > >> > real-time environment will be useful to others.
> > >> >
> > >> > Thanks,
> > >> > James
> > >> >
> > >> >
> > >> > On Tue, 2010-02-16 at 23:18 -0600, Stack wrote:
> > >> >> When you look at top on the loaded server is it the regionserver or
> > >> >> the datanode that is using up the cpu?
> > >> >>
> > >> >> I look at your hdfs listing.  Some of the regions have 3 and 4 files
> > >> >> but most look fine.   A good few are on the compaction verge so I'd
> > >> >> imagine a lot of compaction going on but this is background though it
> > >> >> does suck cpu and i/o... it shouldn't be too bad.
> > >> >>
> > >> >> I took a look at the regionserver log.  The server is struggling
> > >> >> during which time period?  There is one log run at the start and
> > >> >> there
> > >> >> it seems like nothing untoward.  Please enable DEBUG going forward.
> > >> >> It'll shed more light on whats going on: See
> > >> >> http://wiki.apache.org/hadoop/Hbase/FAQ#A5 for how.  Otherwise, the
> > >> >> log doesn't have anything  running long enough for it to have been
> > >> >> under serious load.
> > >> >>
> > >> >> This is a four node cluster now?  You don't seem to have too many
> > >> >> regions per server yet you have a pretty high read/write rate going
> > >> >> by
> > >> >> earlier requests postings.   Maybe you need to add more servers.  Are
> > >> >> you going to add in those 16G machines?
> > >> >>
> > >> >> When you look at the master ui, you can see that the request rate
> > >> >> over
> > >> >> time is about the same for all regionservers?  (refresh the master ui
> > >> >> every so often to take a new sampling).
> > >> >>
> > >> >> St.Ack
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Tue, Feb 16, 2010 at 3:59 PM, James Baldassari
> > >> >> <ja...@dataxu.com> wrote:
> > >> >>> Nope.  We don't do any map reduce.  We're only using Hadoop for
> > >> >>> HBase at
> > >> >>> the moment.
> > >> >>>
> > >> >>> That one node, hdfs02, still has a load of 16 with around 40% I/O
> > >> >>> and
> > >> >>> 120% CPU.  The other nodes are all around 66% CPU with 0-1% I/O
> > >> >>> and load
> > >> >>> of 1 to 3.
> > >> >>>
> > >> >>> I don't think all the requests are going to hdfs02 based on the
> > >> >>> status
> > >> >>> 'detailed' output.  It seems like that node is just having a much
> > >> >>> harder
> > >> >>> time getting the data or something.  Maybe we have some incorrect
> > >> >>> HDFS
> > >> >>> setting.  All the configs are identical, though.
> > >> >>>
> > >> >>> -James
> > >> >>>
> > >> >>>
> > >> >>> On Tue, 2010-02-16 at 17:45 -0600, Dan Washusen wrote:
> > >> >>>> You mentioned in a previous email that you have a Task Tracker
> > >> >>>> process
> > >> >>>> running on each of the nodes.  Is there any chance there is a map
> > >> >>>> reduce job
> > >> >>>> running?
> > >> >>>>
> > >> >>>> On 17 February 2010 10:31, James Baldassari <ja...@dataxu.com>
> > >> >>>> wrote:
> > >> >>>>
> > >> >>>>> On Tue, 2010-02-16 at 16:45 -0600, Stack wrote:
> > >> >>>>>> On Tue, Feb 16, 2010 at 2:25 PM, James Baldassari <
> > ja...@dataxu.com
> > >> >>>>>> >
> > >> >>>>> wrote:
> > >> >>>>>>> On Tue, 2010-02-16 at 14:05 -0600, Stack wrote:
> > >> >>>>>>>> On Tue, Feb 16, 2010 at 10:50 AM, James Baldassari <
> > ja...@dataxu.com
> > >> >>>>>>>> >
> > >> >>>>> wrote:
> > >> >>>>>>>
> > >> >>>>>>> Whether the keys themselves are evenly distributed is another
> > >> >>>>>>> matter.
> > >> >>>>>>> Our keys are user IDs, and they should be fairly random.  If
> > >> >>>>>>> we do a
> > >> >>>>>>> status 'detailed' in the hbase shell we see the following
> > >> >>>>>>> distribution
> > >> >>>>>>> for the value of "requests" (not entirely sure what this value
> > >> >>>>>>> means):
> > >> >>>>>>> hdfs01: 7078
> > >> >>>>>>> hdfs02: 5898
> > >> >>>>>>> hdfs03: 5870
> > >> >>>>>>> hdfs04: 3807
> > >> >>>>>>>
> > >> >>>>>> That looks like they are evenly distributed.  Requests are how
> > >> >>>>>> many
> > >> >>>>>> hits a second.  See the UI on master port 60010.  The numbers
> > >> >>>>>> should
> > >> >>>>>> match.
> > >> >>>>>
> > >> >>>>> So the total across all 4 region servers would be 22,653/
> > >> >>>>> second?  Hmm,
> > >> >>>>> that doesn't seem too bad.  I guess we just need a little more
> > >> >>>>> throughput...
> > >> >>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>> There are no order of magnitude differences here, and the
> > >> >>>>>>> request count
> > >> >>>>>>> doesn't seem to map to the load on the server.  Right now
> > >> >>>>>>> hdfs02 has a
> > >> >>>>>>> load of 16 while the 3 others have loads between 1 and 2.
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> This is interesting.  I went back over your dumps of cache
> > >> >>>>>> stats above
> > >> >>>>>> and the 'loaded' servers didn't have any attribute there that
> > >> >>>>>> differentiated it from others.  For example, the number of
> > >> >>>>>> storefiles
> > >> >>>>>> seemed about same.
> > >> >>>>>>
> > >> >>>>>> I wonder what is making for the high load?  Can you figure it?
> > >> >>>>>> Is it
> > >> >>>>>> high CPU use (unlikely).  Is it then high i/o?  Can you try and
> > >> >>>>>> figure
> > >> >>>>>> whats different about the layout under the loaded server and
> > >> >>>>>> that of
> > >> >>>>>> an unloaded server?  Maybe do a ./bin/hadoop fs -lsr /hbase and
> > >> >>>>>> see if
> > >> >>>>>> anything jumps out at you.
> > >> >>>>>
> > >> >>>>> It's I/O wait that is killing the highly loaded server.  The CPU
> > >> >>>>> usage
> > >> >>>>> reported by top is just about the same across all servers
> > >> >>>>> (around 100%
> > >> >>>>> on an 8-core node), but one server at any given time has a much
> > >> >>>>> higher
> > >> >>>>> load due to I/O.
> > >> >>>>>
> > >> >>>>>>
> > >> >>>>>> If you want to post the above or a loaded servers log to
> > >> >>>>>> pastbin we'll
> > >> >>>>>> take a looksee.
> > >> >>>>>
> > >> >>>>> I'm not really sure what to look for, but maybe someone else
> > >> >>>>> will notice
> > >> >>>>> something, so here's the output of hadoop fs -lsr /hbase:
> > >> >>>>> http://pastebin.com/m98096de
> > >> >>>>>
> > >> >>>>> And here is today's region server log from hdfs02, which seems
> > >> >>>>> to get
> > >> >>>>> hit particularly hard: http://pastebin.com/m1d8a1e5f
> > >> >>>>>
> > >> >>>>> Please note that we restarted it several times today, so some of
> > >> >>>>> those
> > >> >>>>> errors are probably just due to restarting the region server.
> > >> >>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> Applying
> > >> >>>>>>> HBASE-2180 did not make any measurable difference.  There are
> > >> >>>>>>> no errors
> > >> >>>>>>> in the region server logs.  However, looking at the Hadoop
> > >> >>>>>>> datanode
> > >> >>>>>>> logs, I'm seeing lots of these:
> > >> >>>>>>>
> > >> >>>>>>> 2010-02-16 17:07:54,064 ERROR
> > >> >>>>> org.apache.hadoop.hdfs.server.datanode.DataNode:
> > >> >>>>> DatanodeRegistration(
> > >> >>>>> 10.24.183.165:50010,
> > >> >>>>> storageID=DS-1519453437-10.24.183.165-50010-1265907617548,
> > >> >>>>> infoPort=50075,
> > >> >>>>> ipcPort=50020):DataXceiver
> > >> >>>>>>> java.io.EOFException
> > >> >>>>>>>       at java.io.DataInputStream.readShort
> > >> >>>>>>> (DataInputStream.java:298)
> > >> >>>>>>>       at
> > >> >>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run
> > >> >>>>> (DataXceiver.java:79)
> > >> >>>>>>>       at java.lang.Thread.run(Thread.java:619)
> > >> >>>>>>
> > >> >>>>>> You upped xceivers on your hdfs cluster?  If you look at
> > >> >>>>>> otherend of
> > >> >>>>>> the above EOFE, can you see why it died?
> > >> >>>>>
> > >> >>>>> Max xceivers = 3072; datanode handler count = 20; region server
> > >> >>>>> handler
> > >> >>>>> count = 100
> > >> >>>>>
> > >> >>>>> I can't find the other end of the EOFException.  I looked in the
> > >> >>>>> Hadoop
> > >> >>>>> and HBase logs on the server that is the name node and HBase
> > >> >>>>> master, as
> > >> >>>>> well as the on HBase client.
> > >> >>>>>
> > >> >>>>> Thanks for all the help!
> > >> >>>>>
> > >> >>>>> -James
> > >> >>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>>
> > >> >>>>>>> However, I do think it's strange that
> > >> >>>>>>> the load is so unbalanced on the region servers.
> > >> >>>>>>>
> > >> >>>>>>
> > >> >>>>>> I agree.
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>> We're also going to try throwing some more hardware at the
> > >> >>>>>>> problem.
> > >> >>>>>>> We'll set up a new cluster with 16-core, 16G nodes to see if
> > >> >>>>>>> they are
> > >> >>>>>>> better able to handle the large number of client requests.  We
> > >> >>>>>>> might
> > >> >>>>>>> also decrease the block size to 32k or lower.
> > >> >>>>>>>
> > >> >>>>>> Ok.
> > >> >>>>>>
> > >> >>>>>>>> Should only be a matter if you intend distributing the above.
> > >> >>>>>>>
> > >> >>>>>>> This is probably a topic for a separate thread, but I've never
> > >> >>>>>>> seen a
> > >> >>>>>>> legal definition for the word "distribution."  How does this
> > >> >>>>>>> apply to
> > >> >>>>>>> the SaaS model?
> > >> >>>>>>>
> > >> >>>>>> Fair enough.
> > >> >>>>>>
> > >> >>>>>> Something is up.  Especially if hbase-2180 made no difference.
> > >> >>>>>>
> > >> >>>>>> St.Ack
> > >> >>>>>
> > >> >>>>>
> > >> >>>
> > >> >>>
> > >> >
> > >
> > >
> >

Re: Optimizations for random read performance

Reply via email to