I just created a jira to solve the "region server-wide pause when global memstore size is too high" problem, I've seen it before two times before.
https://issues.apache.org/jira/browse/HBASE-2149 J-D On Wed, Jan 20, 2010 at 11:33 AM, Seraph Imalia <ser...@eisp.co.za> wrote: > > > >> From: stack <st...@duboce.net> >> Reply-To: <hbase-user@hadoop.apache.org> >> Date: Wed, 20 Jan 2010 10:55:52 -0800 >> To: <hbase-user@hadoop.apache.org> >> Subject: Re: Hbase pausing problems >> >> On Wed, Jan 20, 2010 at 9:37 AM, Seraph Imalia <ser...@eisp.co.za> wrote: >> >>> >>> Does this mean that when 1 regionserver does a memstore flush, the other >>> two >>> regionservers are also unavailable for writes? I have watched the logs >>> carefully to make sure that not all the regionservers are flushing at the >>> same time. Most of the time, only 1 server flushes at a time and in rare >>> cases, I have seen two at a time. >>> >>> No. >> >> Flush is background process. Reads and writes go ahead while flushing is >> happening. > > This is very nice to know: it is what I expected, but it also means that > this problem is solvable :) > >> >> >> >>>> >>>> It also looks like you have little RAM space given over to hbase, just >>> 1G? >>>> If your traffic is bursty, giving hbase more RAM might help it get over >>>> these write humps. >>> >>> I have it at 1G on purpose. When we first had the problem, I immediately >>> thought the problem was resource related, so I increased the hBase RAM to >>> 3G >>> (each server has 8G - I was carefull to watch for swapping). This made the >>> problem worse because each memstore flush took longer which stopped writing >>> for longer and people started noticing that our system was down during >>> those >>> periods. >> >> >> See above, flushing doesn't block read/writes. Maybe this was something >> else? A GC pause that ran longer because heap is bigger? You said you had >> gc logging enabled. Did you see any long pauses? (Our ZooKeeper brothers >> suggest https://gchisto.dev.java.net/ as a help reading GC logs). >> > > Thanks this tool will be useful - I'll run through the GC Logs and see if > anything jumps out at me. > >> Let me look at your logs to see if I see anything else up there. >> >> >> >>>> Clients will be blocked writing regions carried by the effected >>> regionserver >>>> only. Your HW is not appropriate to the load as currently setup. You >>> might >>>> also consider adding more machines to your cluster. >>>> >>> >>> Hmm... How does hBase decide which region to write to? Is it possible that >>> hBase is deciding to write all our current records to one specific region >>> that happens to be on the server that is busy doing a memstore flush? >>> >>> >> Checkout the region list in master UI. See how they are defined by their >> start and end key. Clients write rows to the region hosting the pertinent >> row-span. > > I have attached the region list of the AdDelivery. Please let me know if > this is something that you need me to upload to a server somewhere? > >> >> Its quiet possible all writes are going to a single region on a single >> server -- which is often an issue -- if your key scheme has something like >> current time for a prefix. > > We are using UUID.randomUUID() as the row key - it has a pretty random > prefix. > >> >> >> >>> We are currently inserting about 6 million rows per day. >> >> >> 6M rows is low, even for a cluster as small as yours (though, maybe your >> inserts are fat? Big cells, many at a time?). > > Inserts contain a maximum of 30 cells - most of the cells are type string > containing integers. About 3 are strings containing no more than 10 > characters, and about 4 are type string containing a decimal value. Not all > 30 cells will exist, most often, some are left out because the data was not > necessary for that specific row. Most rows will contain 25 cells. > >> >> >> >>> SQL Server (which >>> I am so happy to no longer be using for this) was able to write (and >>> replicate to a slave) 9 million records (using the same spec'ed server). I >>> would like to see hBase cope with the 3 we have given it at least when >>> inserting 6 million. Do you think this is possible or is our only answer >>> to >>> throw on more servers? >>> >>> 3 servers should be well able. Tell me more about your schema -- though, >> nevermind, i can find it in your master log. >> St.Ack >> > > Cool :) > Seraph > >> >> >>>> St.Ack >>>> >>>> >>>> >>>>> Thank you for your assistance thus far; please let me know if you need >>> or >>>>> discover anything else? >>>>> >>>>> Regards, >>>>> Seraph >>>>> >>>>> >>>>> >>>>>> From: Jean-Daniel Cryans <jdcry...@apache.org> >>>>>> Reply-To: <hbase-user@hadoop.apache.org> >>>>>> Date: Mon, 18 Jan 2010 09:49:16 -0800 >>>>>> To: <hbase-user@hadoop.apache.org> >>>>>> Subject: Re: Hbase pausing problems >>>>>> >>>>>> The next step would be to take a look at your region server's log >>>>>> around the time of the insert and clients who don't resume after the >>>>>> loss of a region server. If you are able to gzip them and put them on >>>>>> a public server, it would be awesome. >>>>>> >>>>>> Thx, >>>>>> >>>>>> J-D >>>>>> >>>>>> On Mon, Jan 18, 2010 at 1:03 AM, Seraph Imalia <ser...@eisp.co.za> >>>>> wrote: >>>>>>> Answers below... >>>>>>> >>>>>>> Regards, >>>>>>> Seraph >>>>>>> >>>>>>>> From: stack <st...@duboce.net> >>>>>>>> Reply-To: <hbase-user@hadoop.apache.org> >>>>>>>> Date: Fri, 15 Jan 2010 10:10:39 -0800 >>>>>>>> To: <hbase-user@hadoop.apache.org> >>>>>>>> Subject: Re: Hbase pausing problems >>>>>>>> >>>>>>>> How many CPUs? >>>>>>> >>>>>>> 1x Quad Xeon in each server >>>>>>> >>>>>>>> >>>>>>>> You are using default JVM settings (see HBASE_OPTS in hbase-env.sh). >>>>> You >>>>>>>> might want to enable GC logging. See the line after hbase-env.sh. >>>>> Enable >>>>>>>> it. GC logging might tell you about the pauses you are seeing. >>>>>>> >>>>>>> I will enable GC Logging tonight during our slow time because >>> restarting >>>>> the >>>>>>> regionservers causes the clients to pause indefinitely. >>>>>>> >>>>>>>> >>>>>>>> Can you get a fourth server for your cluster and run the master, zk, >>>>> and >>>>>>>> namenode on it and leave the other three servers for regionserver and >>>>>>>> datanode (with perhaps replication == 2 as per J-D to lighten load on >>>>> small >>>>>>>> cluster). >>>>>>> >>>>>>> We plan to double the number of servers in the next few weeks and I >>> will >>>>>>> take your advice to put the master, zk and namenode on it (we will >>> need >>>>> to >>>>>>> have a second one on standby should this one crash). The servers will >>>>> be >>>>>>> ordered shortly and will be here in a week or two. >>>>>>> >>>>>>> That said, I have been monitoring CPU usage and none of them seem >>>>>>> particularly busy. The regionserver on each one hovers around 30% all >>>>> the >>>>>>> time and the datanode sits at about 10% most of the time. If we do >>> have >>>>> a >>>>>>> resource issue, it definitely does not seem to be CPU. >>>>>>> >>>>>>> Increasing RAM did not seem to work either - it just made hBase use a >>>>> bigger >>>>>>> memstore and then it took longer to do a flush. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> More notes inline in below. >>>>>>>> >>>>>>>> On Fri, Jan 15, 2010 at 1:33 AM, Seraph Imalia <ser...@eisp.co.za> >>>>> wrote: >>>>>>>> >>>>>>>>> Approximately every 10 minutes, our entire coldfusion system pauses >>> at >>>>> the >>>>>>>>> point of inserting into hBase for between 30 and 60 seconds and then >>>>>>>>> continues. >>>>>>>>> >>>>>>>>> Yeah, enable GC logging. See if you can make correlation between >>> the >>>>> pause >>>>>>>> the client is seeing and a GC pause. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Investigation... >>>>>>>>> >>>>>>>>> Watching the logs of the regionserver, the pausing of the coldfusion >>>>> system >>>>>>>>> happens as soon as one of the regionservers starts flushing the >>>>> memstore >>>>>>>>> and >>>>>>>>> recovers again as soon as it is finished flushing (recovers as soon >>> as >>>>> it >>>>>>>>> starts compacting). >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ...though, this would seem to point to an issue with your hardware. >>>>> How >>>>>>>> many disks? Are they misconfigured such that they hold up the system >>>>> when >>>>>>>> they are being heavily written to? >>>>>>>> >>>>>>>> >>>>>>>> A regionserver log at DEBUG from around this time so we could look at >>>>> it >>>>>>>> would be helpful. >>>>>>>> >>>>>>>> >>>>>>>> I can recreate the error just by stopping 1 of the regionservers; but >>>>> then >>>>>>>>> starting the regionserver again does not make coldfusion recover >>> until >>>>> I >>>>>>>>> restart the coldfusion servers. It is important to note that if I >>>>> keep the >>>>>>>>> built in hBase shell running, it is happily able to put and get data >>>>> to and >>>>>>>>> from hBase whilst coldfusion is busy pausing/failing. >>>>>>>>> >>>>>>>> >>>>>>>> This seems odd. Enable DEBUG for the client-side. Do you see the >>>>> shell >>>>>>>> recalibrating finding new locations for regions after you shutdown >>> the >>>>>>>> single regionserver, something that your coldfusion is not doing? >>> Or, >>>>>>>> maybe, the shell is putting a regionserver that has not been >>> disturbed >>>>> by >>>>>>>> your start/stop? >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> I have tried increasing the regionserver¹s RAM to 3 Gigs and this >>> just >>>>> made >>>>>>>>> the problem worse because it took longer for the regionservers to >>>>> flush the >>>>>>>>> memory store. >>>>>>>> >>>>>>>> >>>>>>>> Again, if flushing is holding up the machine, if you can't write a >>> file >>>>> in >>>>>>>> background without it freezing your machine, then your machines are >>>>> anemic >>>>>>>> or misconfigured? >>>>>>>> >>>>>>>> >>>>>>>>> One of the links I found on your site mentioned increasing >>>>>>>>> the default value for hbase.regionserver.handler.count to 100 this >>>>> did >>>>>>>>> not >>>>>>>>> seem to make any difference. >>>>>>>> >>>>>>>> >>>>>>>> Leave this configuration in place I'd say. >>>>>>>> >>>>>>>> Are you seeing 'blocking' messages in the regionserver logs? >>>>> Regionserver >>>>>>>> will stop taking on writes if it thinks its being overrun to prevent >>>>> itself >>>>>>>> OOME'ing. Grep the 'multiplier' configuration in hbase-default.xml. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I have double checked that the memory flush >>>>>>>>> very rarely happens on more than 1 regionserver at a time in fact >>> in >>>>> my >>>>>>>>> many hours of staring at tails of logs, it only happened once where >>>>> two >>>>>>>>> regionservers flushed at the same time. >>>>>>>>> >>>>>>>>> You've enabled DEBUG? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> My investigations point strongly towards a coding problem on our >>> side >>>>>>>>> rather >>>>>>>>> than a problem with the server setup or hBase itself. >>>>>>>> >>>>>>>> >>>>>>>> If things were slow from client-perspective, that might be a >>>>> client-side >>>>>>>> coding problem but these pauses, unless you have a fly-by deadlock in >>>>> your >>>>>>>> client-code, its probably an hbase issue. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I say this because >>>>>>>>> whilst I understand why a regionserver would go offline during a >>>>> memory >>>>>>>>> flush, I would expect the other two regionservers to pick up the >>> load >>>>> >>>>>>>>> especially since the built-in hbase shell has no problem accessing >>>>> hBase >>>>>>>>> whilst a regionserver is busy doing a memstore flush. >>>>>>>>> >>>>>>>>> HBase does not go offline during memory flush. It continues to be >>>>>>>> available for reads and writes during this time. And see J-D >>> response >>>>> for >>>>>>>> incorrect understanding of how loading of regions is done in an hbase >>>>>>>> cluster. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ... >>>>>>>> >>>>>>>> >>>>>>>> I think either I am leaving out code that is required to determine >>>>> which >>>>>>>>> RegionServers are available OR I am keeping too many hBase objects >>> in >>>>> RAM >>>>>>>>> instead of calling their constructors each time (my purpose >>> obviously >>>>> was >>>>>>>>> to >>>>>>>>> improve performance). >>>>>>>>> >>>>>>>>> >>>>>>>> For sure keep single instance of HBaseConfiguration at least and use >>>>> this >>>>>>>> constructing all HTable and HBaseAdmin instances. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Currently the live system is inserting over 7 Million records per >>> day >>>>>>>>> (mostly between 8AM and 10PM) which is not a ridiculously high load. >>>>>>>>> >>>>>>>>> >>>>>>>> What size are the records? What is your table schema? How many >>>>> regions do >>>>>>>> you currently have in your table? >>>>>>>> >>>>>>>> St.Ack >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> >>> >>> >>> >>> > >