Re: Optimizations for random read performance

Daniel Washusen Wed, 17 Feb 2010 00:19:03 -0800

Glad you sorted it out!  Please do tell...

On 17/02/2010, at 4:59 PM, James Baldassari <ja...@dataxu.com> wrote:


> Hi,
>
> I think we managed to solve our performance and load issues.
> Everything
> has been stable for about an hour now, but I'm not going to raise the
> victory flag until the morning because we've had short periods of
> stability in the past.
>
> I've been working on this problem non-stop for almost a week now, so I
> really need to get some sleep, but if everything looks good tomorrow
> I'll write up a summary of all the changes we made and share it with
> the
> group.  Hopefully this exercise in tuning for a high-throughput
> real-time environment will be useful to others.
>
> Thanks,
> James
>
>
> On Tue, 2010-02-16 at 23:18 -0600, Stack wrote:
>> When you look at top on the loaded server is it the regionserver or
>> the datanode that is using up the cpu?
>>
>> I look at your hdfs listing.  Some of the regions have 3 and 4 files
>> but most look fine.   A good few are on the compaction verge so I'd
>> imagine a lot of compaction going on but this is background though it
>> does suck cpu and i/o... it shouldn't be too bad.
>>
>> I took a look at the regionserver log.  The server is struggling
>> during which time period?  There is one log run at the start and
>> there
>> it seems like nothing untoward.  Please enable DEBUG going forward.
>> It'll shed more light on whats going on: See
>> http://wiki.apache.org/hadoop/Hbase/FAQ#A5 for how.  Otherwise, the
>> log doesn't have anything  running long enough for it to have been
>> under serious load.
>>
>> This is a four node cluster now?  You don't seem to have too many
>> regions per server yet you have a pretty high read/write rate going
>> by
>> earlier requests postings.   Maybe you need to add more servers.  Are
>> you going to add in those 16G machines?
>>
>> When you look at the master ui, you can see that the request rate
>> over
>> time is about the same for all regionservers?  (refresh the master ui
>> every so often to take a new sampling).
>>
>> St.Ack
>>
>>
>>
>>
>> On Tue, Feb 16, 2010 at 3:59 PM, James Baldassari
>> <ja...@dataxu.com> wrote:
>>> Nope.  We don't do any map reduce.  We're only using Hadoop for
>>> HBase at
>>> the moment.
>>>
>>> That one node, hdfs02, still has a load of 16 with around 40% I/O
>>> and
>>> 120% CPU.  The other nodes are all around 66% CPU with 0-1% I/O
>>> and load
>>> of 1 to 3.
>>>
>>> I don't think all the requests are going to hdfs02 based on the
>>> status
>>> 'detailed' output.  It seems like that node is just having a much
>>> harder
>>> time getting the data or something.  Maybe we have some incorrect
>>> HDFS
>>> setting.  All the configs are identical, though.
>>>
>>> -James
>>>
>>>
>>> On Tue, 2010-02-16 at 17:45 -0600, Dan Washusen wrote:
>>>> You mentioned in a previous email that you have a Task Tracker
>>>> process
>>>> running on each of the nodes.  Is there any chance there is a map
>>>> reduce job
>>>> running?
>>>>
>>>> On 17 February 2010 10:31, James Baldassari <ja...@dataxu.com>
>>>> wrote:
>>>>
>>>>> On Tue, 2010-02-16 at 16:45 -0600, Stack wrote:
>>>>>> On Tue, Feb 16, 2010 at 2:25 PM, James Baldassari <ja...@dataxu.com
>>>>>> >
>>>>> wrote:
>>>>>>> On Tue, 2010-02-16 at 14:05 -0600, Stack wrote:
>>>>>>>> On Tue, Feb 16, 2010 at 10:50 AM, James Baldassari <ja...@dataxu.com
>>>>>>>> >
>>>>> wrote:
>>>>>>>
>>>>>>> Whether the keys themselves are evenly distributed is another
>>>>>>> matter.
>>>>>>> Our keys are user IDs, and they should be fairly random.  If
>>>>>>> we do a
>>>>>>> status 'detailed' in the hbase shell we see the following
>>>>>>> distribution
>>>>>>> for the value of "requests" (not entirely sure what this value
>>>>>>> means):
>>>>>>> hdfs01: 7078
>>>>>>> hdfs02: 5898
>>>>>>> hdfs03: 5870
>>>>>>> hdfs04: 3807
>>>>>>>
>>>>>> That looks like they are evenly distributed.  Requests are how
>>>>>> many
>>>>>> hits a second.  See the UI on master port 60010.  The numbers
>>>>>> should
>>>>>> match.
>>>>>
>>>>> So the total across all 4 region servers would be 22,653/
>>>>> second?  Hmm,
>>>>> that doesn't seem too bad.  I guess we just need a little more
>>>>> throughput...
>>>>>
>>>>>>
>>>>>>
>>>>>>> There are no order of magnitude differences here, and the
>>>>>>> request count
>>>>>>> doesn't seem to map to the load on the server.  Right now
>>>>>>> hdfs02 has a
>>>>>>> load of 16 while the 3 others have loads between 1 and 2.
>>>>>>
>>>>>>
>>>>>> This is interesting.  I went back over your dumps of cache
>>>>>> stats above
>>>>>> and the 'loaded' servers didn't have any attribute there that
>>>>>> differentiated it from others.  For example, the number of
>>>>>> storefiles
>>>>>> seemed about same.
>>>>>>
>>>>>> I wonder what is making for the high load?  Can you figure it?
>>>>>> Is it
>>>>>> high CPU use (unlikely).  Is it then high i/o?  Can you try and
>>>>>> figure
>>>>>> whats different about the layout under the loaded server and
>>>>>> that of
>>>>>> an unloaded server?  Maybe do a ./bin/hadoop fs -lsr /hbase and
>>>>>> see if
>>>>>> anything jumps out at you.
>>>>>
>>>>> It's I/O wait that is killing the highly loaded server.  The CPU
>>>>> usage
>>>>> reported by top is just about the same across all servers
>>>>> (around 100%
>>>>> on an 8-core node), but one server at any given time has a much
>>>>> higher
>>>>> load due to I/O.
>>>>>
>>>>>>
>>>>>> If you want to post the above or a loaded servers log to
>>>>>> pastbin we'll
>>>>>> take a looksee.
>>>>>
>>>>> I'm not really sure what to look for, but maybe someone else
>>>>> will notice
>>>>> something, so here's the output of hadoop fs -lsr /hbase:
>>>>> http://pastebin.com/m98096de
>>>>>
>>>>> And here is today's region server log from hdfs02, which seems
>>>>> to get
>>>>> hit particularly hard: http://pastebin.com/m1d8a1e5f
>>>>>
>>>>> Please note that we restarted it several times today, so some of
>>>>> those
>>>>> errors are probably just due to restarting the region server.
>>>>>
>>>>>>
>>>>>>
>>>>>> Applying
>>>>>>> HBASE-2180 did not make any measurable difference.  There are
>>>>>>> no errors
>>>>>>> in the region server logs.  However, looking at the Hadoop
>>>>>>> datanode
>>>>>>> logs, I'm seeing lots of these:
>>>>>>>
>>>>>>> 2010-02-16 17:07:54,064 ERROR
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>>>>> DatanodeRegistration(
>>>>> 10.24.183.165:50010,
>>>>> storageID=DS-1519453437-10.24.183.165-50010-1265907617548,
>>>>> infoPort=50075,
>>>>> ipcPort=50020):DataXceiver
>>>>>>> java.io.EOFException
>>>>>>>       at java.io.DataInputStream.readShort
>>>>>>> (DataInputStream.java:298)
>>>>>>>       at
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run
>>>>> (DataXceiver.java:79)
>>>>>>>       at java.lang.Thread.run(Thread.java:619)
>>>>>>
>>>>>> You upped xceivers on your hdfs cluster?  If you look at
>>>>>> otherend of
>>>>>> the above EOFE, can you see why it died?
>>>>>
>>>>> Max xceivers = 3072; datanode handler count = 20; region server
>>>>> handler
>>>>> count = 100
>>>>>
>>>>> I can't find the other end of the EOFException.  I looked in the
>>>>> Hadoop
>>>>> and HBase logs on the server that is the name node and HBase
>>>>> master, as
>>>>> well as the on HBase client.
>>>>>
>>>>> Thanks for all the help!
>>>>>
>>>>> -James
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> However, I do think it's strange that
>>>>>>> the load is so unbalanced on the region servers.
>>>>>>>
>>>>>>
>>>>>> I agree.
>>>>>>
>>>>>>
>>>>>>> We're also going to try throwing some more hardware at the
>>>>>>> problem.
>>>>>>> We'll set up a new cluster with 16-core, 16G nodes to see if
>>>>>>> they are
>>>>>>> better able to handle the large number of client requests.  We
>>>>>>> might
>>>>>>> also decrease the block size to 32k or lower.
>>>>>>>
>>>>>> Ok.
>>>>>>
>>>>>>>> Should only be a matter if you intend distributing the above.
>>>>>>>
>>>>>>> This is probably a topic for a separate thread, but I've never
>>>>>>> seen a
>>>>>>> legal definition for the word "distribution."  How does this
>>>>>>> apply to
>>>>>>> the SaaS model?
>>>>>>>
>>>>>> Fair enough.
>>>>>>
>>>>>> Something is up.  Especially if hbase-2180 made no difference.
>>>>>>
>>>>>> St.Ack
>>>>>
>>>>>
>>>
>>>
>

Re: Optimizations for random read performance

Reply via email to