Re: Hbase pausing problems

Jean-Daniel Cryans Wed, 20 Jan 2010 11:40:10 -0800

I just created a jira to solve the "region server-wide pause when
global memstore size is too high" problem, I've seen it before two
times before.


https://issues.apache.org/jira/browse/HBASE-2149

J-D

On Wed, Jan 20, 2010 at 11:33 AM, Seraph Imalia <ser...@eisp.co.za> wrote:
>
>
>
>> From: stack <st...@duboce.net>
>> Reply-To: <hbase-user@hadoop.apache.org>
>> Date: Wed, 20 Jan 2010 10:55:52 -0800
>> To: <hbase-user@hadoop.apache.org>
>> Subject: Re: Hbase pausing problems
>>
>> On Wed, Jan 20, 2010 at 9:37 AM, Seraph Imalia <ser...@eisp.co.za> wrote:
>>
>>>
>>> Does this mean that when 1 regionserver does a memstore flush, the other
>>> two
>>> regionservers are also unavailable for writes?  I have watched the logs
>>> carefully to make sure that not all the regionservers are flushing at the
>>> same time.  Most of the time, only 1 server flushes at a time and in rare
>>> cases, I have seen two at a time.
>>>
>>> No.
>>
>> Flush is background process.  Reads and writes go ahead while flushing is
>> happening.
>
> This is very nice to know: it is what I expected, but it also means that
> this problem is solvable :)
>
>>
>>
>>
>>>>
>>>> It also looks like you have little RAM space given over to hbase, just
>>> 1G?
>>>> If your traffic is bursty, giving hbase more RAM might help it get over
>>>> these write humps.
>>>
>>> I have it at 1G on purpose.  When we first had the problem, I immediately
>>> thought the problem was resource related, so I increased the hBase RAM to
>>> 3G
>>> (each server has 8G - I was carefull to watch for swapping).  This made the
>>> problem worse because each memstore flush took longer which stopped writing
>>> for longer and people started noticing that our system was down during
>>> those
>>> periods.
>>
>>
>> See above, flushing doesn't block read/writes.  Maybe this was something
>> else?  A GC pause that ran longer because heap is bigger?  You said you had
>> gc logging enabled.  Did you see any long pauses?  (Our ZooKeeper brothers
>> suggest https://gchisto.dev.java.net/ as a help reading GC logs).
>>
>
> Thanks this tool will be useful - I'll run through the GC Logs and see if
> anything jumps out at me.
>
>> Let me look at your logs to see if I see anything else up there.
>>
>>
>>
>>>> Clients will be blocked writing regions carried by the effected
>>> regionserver
>>>> only.  Your HW is not appropriate to the load as currently setup.  You
>>> might
>>>> also consider adding more machines to your cluster.
>>>>
>>>
>>> Hmm... How does hBase decide which region to write to?  Is it possible that
>>> hBase is deciding to write all our current records to one specific region
>>> that happens to be on the server that is busy doing a memstore flush?
>>>
>>>
>> Checkout the region list in master UI.  See how they are defined by their
>> start and end key.  Clients write rows to the region hosting the pertinent
>> row-span.
>
> I have attached the region list of the AdDelivery.  Please let me know if
> this is something that you need me to upload to a server somewhere?
>
>>
>> Its quiet possible all writes are going to a single region on a single
>> server -- which is often an issue -- if your key scheme has something like
>> current time for a prefix.
>
> We are using UUID.randomUUID() as the row key - it has a pretty random
> prefix.
>
>>
>>
>>
>>> We are currently inserting about 6 million rows per day.
>>
>>
>> 6M rows is low, even for a cluster as small as yours (though, maybe your
>> inserts are fat?  Big cells, many at a time?).
>
> Inserts contain a maximum of 30 cells - most of the cells are type string
> containing integers.  About 3 are strings containing no more than 10
> characters, and about 4 are type string containing a decimal value.  Not all
> 30 cells will exist, most often, some are left out because the data was not
> necessary for that specific row. Most rows will contain 25 cells.
>
>>
>>
>>
>>> SQL Server (which
>>> I am so happy to no longer be using for this) was able to write (and
>>> replicate to a slave) 9 million records (using the same spec'ed server).  I
>>> would like to see hBase cope with the 3 we have given it at least when
>>> inserting 6 million.  Do you think this is possible or is our only answer
>>> to
>>> throw on more servers?
>>>
>>> 3 servers should be well able.  Tell me more about your schema -- though,
>> nevermind, i can find it in your master log.
>> St.Ack
>>
>
> Cool :)
> Seraph
>
>>
>>
>>>> St.Ack
>>>>
>>>>
>>>>
>>>>> Thank you for your assistance thus far; please let me know if you need
>>> or
>>>>> discover anything else?
>>>>>
>>>>> Regards,
>>>>> Seraph
>>>>>
>>>>>
>>>>>
>>>>>> From: Jean-Daniel Cryans <jdcry...@apache.org>
>>>>>> Reply-To: <hbase-user@hadoop.apache.org>
>>>>>> Date: Mon, 18 Jan 2010 09:49:16 -0800
>>>>>> To: <hbase-user@hadoop.apache.org>
>>>>>> Subject: Re: Hbase pausing problems
>>>>>>
>>>>>> The next step would be to take a look at your region server's log
>>>>>> around the time of the insert and clients who don't resume after the
>>>>>> loss of a region server. If you are able to gzip them and put them on
>>>>>> a public server, it would be awesome.
>>>>>>
>>>>>> Thx,
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Mon, Jan 18, 2010 at 1:03 AM, Seraph Imalia <ser...@eisp.co.za>
>>>>> wrote:
>>>>>>> Answers below...
>>>>>>>
>>>>>>> Regards,
>>>>>>> Seraph
>>>>>>>
>>>>>>>> From: stack <st...@duboce.net>
>>>>>>>> Reply-To: <hbase-user@hadoop.apache.org>
>>>>>>>> Date: Fri, 15 Jan 2010 10:10:39 -0800
>>>>>>>> To: <hbase-user@hadoop.apache.org>
>>>>>>>> Subject: Re: Hbase pausing problems
>>>>>>>>
>>>>>>>> How many CPUs?
>>>>>>>
>>>>>>> 1x Quad Xeon in each server
>>>>>>>
>>>>>>>>
>>>>>>>> You are using default JVM settings (see HBASE_OPTS in hbase-env.sh).
>>>>>  You
>>>>>>>> might want to enable GC logging.  See the line after hbase-env.sh.
>>>>>  Enable
>>>>>>>> it.  GC logging might tell you about the pauses you are seeing.
>>>>>>>
>>>>>>> I will enable GC Logging tonight during our slow time because
>>> restarting
>>>>> the
>>>>>>> regionservers causes the clients to pause indefinitely.
>>>>>>>
>>>>>>>>
>>>>>>>> Can you get a fourth server for your cluster and run the master, zk,
>>>>> and
>>>>>>>> namenode on it and leave the other three servers for regionserver and
>>>>>>>> datanode (with perhaps replication == 2 as per J-D to lighten load on
>>>>> small
>>>>>>>> cluster).
>>>>>>>
>>>>>>> We plan to double the number of servers in the next few weeks and I
>>> will
>>>>>>> take your advice to put the master, zk and namenode on it (we will
>>> need
>>>>> to
>>>>>>> have a second one on standby should this one crash).  The servers will
>>>>> be
>>>>>>> ordered shortly and will be here in a week or two.
>>>>>>>
>>>>>>> That said, I have been monitoring CPU usage and none of them seem
>>>>>>> particularly busy.  The regionserver on each one hovers around 30% all
>>>>> the
>>>>>>> time and the datanode sits at about 10% most of the time.  If we do
>>> have
>>>>> a
>>>>>>> resource issue, it definitely does not seem to be CPU.
>>>>>>>
>>>>>>> Increasing RAM did not seem to work either - it just made hBase use a
>>>>> bigger
>>>>>>> memstore and then it took longer to do a flush.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> More notes inline in below.
>>>>>>>>
>>>>>>>> On Fri, Jan 15, 2010 at 1:33 AM, Seraph Imalia <ser...@eisp.co.za>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> Approximately every 10 minutes, our entire coldfusion system pauses
>>> at
>>>>> the
>>>>>>>>> point of inserting into hBase for between 30 and 60 seconds and then
>>>>>>>>> continues.
>>>>>>>>>
>>>>>>>>> Yeah, enable GC logging.  See if you can make correlation between
>>> the
>>>>> pause
>>>>>>>> the client is seeing and a GC pause.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Investigation...
>>>>>>>>>
>>>>>>>>> Watching the logs of the regionserver, the pausing of the coldfusion
>>>>> system
>>>>>>>>> happens as soon as one of the regionservers starts flushing the
>>>>> memstore
>>>>>>>>> and
>>>>>>>>> recovers again as soon as it is finished flushing (recovers as soon
>>> as
>>>>> it
>>>>>>>>> starts compacting).
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ...though, this would seem to point to an issue with your hardware.
>>>>>  How
>>>>>>>> many disks?  Are they misconfigured such that they hold up the system
>>>>> when
>>>>>>>> they are being heavily written to?
>>>>>>>>
>>>>>>>>
>>>>>>>> A regionserver log at DEBUG from around this time so we could look at
>>>>> it
>>>>>>>> would be helpful.
>>>>>>>>
>>>>>>>>
>>>>>>>> I can recreate the error just by stopping 1 of the regionservers; but
>>>>> then
>>>>>>>>> starting the regionserver again does not make coldfusion recover
>>> until
>>>>> I
>>>>>>>>> restart the coldfusion servers.  It is important to note that if I
>>>>> keep the
>>>>>>>>> built in hBase shell running, it is happily able to put and get data
>>>>> to and
>>>>>>>>> from hBase whilst coldfusion is busy pausing/failing.
>>>>>>>>>
>>>>>>>>
>>>>>>>> This seems odd.  Enable DEBUG for the client-side.  Do you see the
>>>>> shell
>>>>>>>> recalibrating finding new locations for regions after you shutdown
>>> the
>>>>>>>> single regionserver, something that your coldfusion is not doing?
>>>  Or,
>>>>>>>> maybe, the shell is putting a regionserver that has not been
>>> disturbed
>>>>> by
>>>>>>>> your start/stop?
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have tried increasing the regionserver¹s RAM to 3 Gigs and this
>>> just
>>>>> made
>>>>>>>>> the problem worse because it took longer for the regionservers to
>>>>> flush the
>>>>>>>>> memory store.
>>>>>>>>
>>>>>>>>
>>>>>>>> Again, if flushing is holding up the machine, if you can't write a
>>> file
>>>>> in
>>>>>>>> background without it freezing your machine, then your machines are
>>>>> anemic
>>>>>>>> or misconfigured?
>>>>>>>>
>>>>>>>>
>>>>>>>>> One of the links I found on your site mentioned increasing
>>>>>>>>> the default value for hbase.regionserver.handler.count to 100  this
>>>>> did
>>>>>>>>> not
>>>>>>>>> seem to make any difference.
>>>>>>>>
>>>>>>>>
>>>>>>>> Leave this configuration in place I'd say.
>>>>>>>>
>>>>>>>> Are you seeing 'blocking' messages in the regionserver logs?
>>>>>  Regionserver
>>>>>>>> will stop taking on writes if it thinks its being overrun to prevent
>>>>> itself
>>>>>>>> OOME'ing.  Grep the 'multiplier' configuration in hbase-default.xml.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> I have double checked that the memory flush
>>>>>>>>> very rarely happens on more than 1 regionserver at a time  in fact
>>> in
>>>>> my
>>>>>>>>> many hours of staring at tails of logs, it only happened once where
>>>>> two
>>>>>>>>> regionservers flushed at the same time.
>>>>>>>>>
>>>>>>>>> You've enabled DEBUG?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> My investigations point strongly towards a coding problem on our
>>> side
>>>>>>>>> rather
>>>>>>>>> than a problem with the server setup or hBase itself.
>>>>>>>>
>>>>>>>>
>>>>>>>> If things were slow from client-perspective, that might be a
>>>>> client-side
>>>>>>>> coding problem but these pauses, unless you have a fly-by deadlock in
>>>>> your
>>>>>>>> client-code, its probably an hbase issue.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>  I say this because
>>>>>>>>> whilst I understand why a regionserver would go offline during a
>>>>> memory
>>>>>>>>> flush, I would expect the other two regionservers to pick up the
>>> load
>>>>> 
>>>>>>>>> especially since the built-in hbase shell has no problem accessing
>>>>> hBase
>>>>>>>>> whilst a regionserver is busy doing a memstore flush.
>>>>>>>>>
>>>>>>>>> HBase does not go offline during memory flush.  It continues to be
>>>>>>>> available for reads and writes during this time.  And see J-D
>>> response
>>>>> for
>>>>>>>> incorrect understanding of how loading of regions is done in an hbase
>>>>>>>> cluster.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>
>>>>>>>> I think either I am leaving out code that is required to determine
>>>>> which
>>>>>>>>> RegionServers are available OR I am keeping too many hBase objects
>>> in
>>>>> RAM
>>>>>>>>> instead of calling their constructors each time (my purpose
>>> obviously
>>>>> was
>>>>>>>>> to
>>>>>>>>> improve performance).
>>>>>>>>>
>>>>>>>>>
>>>>>>>> For sure keep single instance of HBaseConfiguration at least and use
>>>>> this
>>>>>>>> constructing all HTable and HBaseAdmin instances.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Currently the live system is inserting over 7 Million records per
>>> day
>>>>>>>>> (mostly between 8AM and 10PM) which is not a ridiculously high load.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> What size are the records?   What is your table schema?  How many
>>>>> regions do
>>>>>>>> you currently have in your table?
>>>>>>>>
>>>>>>>>  St.Ack
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>

Re: Hbase pausing problems

Reply via email to