Re: Hbase pausing problems

Seraph Imalia Wed, 17 Feb 2010 01:09:25 -0800

Hi Guys,

Some feedback from the changes we have made.
So far, we have done the following...


1) Updated hBase from 0.20.2 to 0.20.3
2) Enabled LZO Compression
3) Upped the Java Max Heap from 1GB to 3GB
4) Changed hbase.hstore.blockingStoreFiles to 15
5) Changed hbase.regions.percheckin to 100
6) Changed hbase.regionserver.global.memstore.upperLimit to 0.5
7) Changed hbase.regionserver.global.memstore.lowerLimit to 0.48

We undid 6) and 7) above, because when those two settings were enabled,
hBase would insert records at about 1 record every 5 seconds - basically
inserting might as well have been non-existant.

After disabling 6) and 7), we have noticed an improvement on performance,
but the pausing issue is still there.  I have noticed now, however, that
only some memstore flushes cause inserting to pause - before, all memstore
flushes would pause inserting.  Starting hBase is also a lot faster because
it assigns the regions faster.  The htm file I attached to this email is the
front page of the hBase Web Console.

Our next step is to add more servers (we are just waiting for some space to
be made in our server cabinet), but I figures I would give some feedback so
long in case something I have said above flags some alarm bells and help
figure out the problem.

Regards,
Seraph


> From: Jean-Daniel Cryans <jdcry...@apache.org>
> Reply-To: <hbase-user@hadoop.apache.org>
> Date: Tue, 9 Feb 2010 09:45:42 -0800
> To: <hbase-user@hadoop.apache.org>
> Subject: Re: Hbase pausing problems
> 
> You want namenode+jobtracker+hbase-master on one node
> Then you want all slaves to be datanode+tastracker+regionserver
> 
> Since HDFS writes the first replica on the local node if possible, that
> improves data locality when the DN and the RS are together.
> 
> Don't spend time on getting 2 namenode nodes, that will involve an
> incredible amount or work and will possibly fail. Just make sure your master
> node is reliable (mirrored disks, 2 PSU, etc) and that should almost never
> be a problem. In 2 years of using Hadoop I never had a NN failure. Also with
> such a small cluster you lose too much processing power, unless that node is
> also configured as a slave.
> 
> J-D
> 
> On Tue, Feb 9, 2010 at 7:30 AM, Seraph Imalia <ser...@eisp.co.za> wrote:
> 
>> Hi Jean-Daniel,
>> 
>> Thank you for your input - I'll make these changes and try it tonight.  I
>> think it is probably also a good idea for me to enable compression now
>> whilst the load is off the servers.
>> 
>> We have a physical space issue in our server cabinet which will get
>> resolved
>> sometime in march and we are planning to add an additional 3 servers to the
>> setup + maybe an additional one for the namenode and master hBase server.
>>  I
>> read somewhere that it is wise to place a datanode and regionserver
>> together
>> per server.  Is this wise?  Or is there a better way to configure this?
>> 
>> Regards,
>> Seraph
>> 
>> 
>>> From: Jean-Daniel Cryans <jdcry...@apache.org>
>>> Reply-To: <hbase-user@hadoop.apache.org>
>>> Date: Mon, 8 Feb 2010 10:11:36 -0800
>>> To: <hbase-user@hadoop.apache.org>
>>> Subject: Re: Hbase pausing problems
>>> 
>>> The "too many store files" is due to this
>>> 
>>>   <property>
>>>     <name>hbase.hstore.blockingStoreFiles</name>
>>>     <value>7</value>
>>>     <description>
>>>     If more than this number of StoreFiles in any one Store
>>>     (one StoreFile is written per flush of MemStore) then updates are
>>>     blocked for this HRegion until a compaction is completed, or
>>>     until hbase.hstore.blockingWaitTime has been exceeded.
>>>     </description>
>>>   </property>
>>> 
>>> This block is there in order to not overrun the system with uncompacted
>>> files. In the past I saw an import driving the number of store files to
>> more
>>> than 100 and it was just impossible to compact. The default setting is
>>> especially low since the default heap size is 1GB, with 3GB you could set
>> it
>>> to 13-15.
>>> 
>>> Since you have a high number of regions, consider tweaking this:
>>> 
>>>   <property>
>>>     <name>hbase.regions.percheckin</name>
>>>     <value>10</value>
>>>     <description>Maximum number of regions that can be assigned in a
>> single
>>> go
>>>     to a region server.
>>>     </description>
>>>   </property>
>>> 
>>> Since you have such a low number of nodes, a value of 100 would make a
>> lot
>>> of sense.
>>> 
>>> On a general note, it seems that your machines are unable to keep up with
>>> the size of data that's coming in and lots of compaction (and flushes)
>> need
>>> to happen. The fact that only 3 machines are doing the work exacerbates
>> the
>>> problem. Using the configurations I just told you about will lesser the
>>> problem but you should really consider using LZO or even GZ since all you
>>> care about is storing a lot of data and only read a few rows per day.
>>> Enabling GZ won't require any new software on these nodes and there's no
>>> chance of losing data.
>>> 
>>> J-D
>>> 
>>> On Mon, Feb 8, 2010 at 5:28 AM, Seraph Imalia <ser...@eisp.co.za> wrote:
>>> 
>>>> Hi Guys,
>>>> 
>>>> I am having another problem with hBase that is probably related to the
>>>> problems I was emailing you about earlier this year.
>>>> 
>>>> I have finally had a chance to at least try one of the suggestions you
>> had
>>>> to help resolve our problems.  I increased the heap size per server to
>> 3Gig
>>>> and added the following to the hbase-site.xml files on each server last
>>>> night (I have not enabled compression yet for fear of loosing data - I
>> need
>>>> to wait for when I have a long period of time where hBase can be offline
>> for
>>>> and for incase there are problems I need to resolve) ...
>>>> 
>>>> <property>
>>>>    <name>hbase.regionserver.global.memstore.upperLimit</name>
>>>>    <value>0.5</value>
>>>>    <description>Maximum size of all memstores in a region server before
>> new
>>>>      updates are blocked and flushes are forced. Defaults to 40% of heap
>>>>    </description>
>>>> </property>
>>>> <property>
>>>>    <name>hbase.regionserver.global.memstore.lowerLimit</name>
>>>>    <value>0.48</value>
>>>>    <description>When memstores are being forced to flush to make room in
>>>>      memory, keep flushing until we hit this mark. Defaults to 30% of
>> heap.
>>>>      This value equal to hbase.regionserver.global.memstore.upperLimit
>>>> causes
>>>>      the minimum possible flushing to occur when updates are blocked due
>> to
>>>>      memstore limiting.
>>>>    </description>
>>>> </property>
>>>> 
>>>> ...and then restarted hbase
>>>> bin/stop-hbase.sh
>>>> bin/start-hbase.sh
>>>> 
>>>> Hbase spent about 30 minutes assigning regions to each of the region
>>>> servers (we now have 2595 regions).  When it had finished (which is
>> usually
>>>> when our clients apps are able to start adding rows), client apps were
>> only
>>>> able to add rows at an incredibly slow rate (about 1 every second) which
>> was
>>>> not even able to cope with the miniscule load we have at 3AM in the
>> morning.
>>>> 
>>>> I left hBase for about 30 minutes after region assignment had completed
>> and
>>>> the situation did not improve.  I then tried changing the lowerLimit to
>> 0.38
>>>> and restart again which also did not improve the situation.  I then
>> removed
>>>> the above lines by commenting them out (<!-- -->) and restarted hBase
>> again.
>>>>  Again, 30 minutes later after it had finished assigning regions, it was
>> no
>>>> different.
>>>> 
>>>> I therefore assumed that the problem was not caused by the addition of
>> the
>>>> properties but rather just by the fact that it had been restarted.  I
>>>> checked the log files very closely and I noticed that when I disable the
>>>> client apps, the regionservers are frantically requesting major
>> compactions
>>>> and complaining about too many store files for a region.
>>>> 
>>>> I then assumed that the system is under strain performing houskeeping
>> and
>>>> there is nothing I can do with my limited knowledge to improve it
>> without
>>>> contacting you guys about it first.  It was 4AM this morning and I had
>> no
>>>> choice but to do whatever I could to get our client apps up and running
>>>> before morning, so I wrote some quick coldfusion and java code to get
>> the
>>>> data inserted into local mysql servers so that hBase could have time to
>> do
>>>> whatever it was doing.
>>>> 
>>>> It is still compacting and it is now 9 hours after the last restart.
>> With 0
>>>> load from client apps.
>>>> 
>>>> Please can you assist by shedding some light on what is actually
>> happening?
>>>> - Is my thinking correct? - Is it related to the "hBase pausing
>> problems" we
>>>> are still having? - What do I do to fix it or make it hurry up?
>>>> 
>>>> Regards,
>>>> Seraph
>>>> 
>>>> 
>> 
>> 
>> 
>>

Re: Hbase pausing problems

Reply via email to