[ 
https://issues.apache.org/jira/browse/HBASE-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Whiting updated HBASE-2646:
--------------------------------

    Attachment: prioritycompactionqueue-0.20.4.patch

Here is my first go at a patch to prioritize compaction requests.  Right now 
there are 3 levels that a request can take, LOW, NORMAL, HIGH_BLOCKING.  Right 
now the only request that has a HIGH_BLOCKING priority is when the memstore 
cannot do a flush because it has too many hstore files.  All other requests are 
NORMAL.  LOW currently is unused.  

One thing I really like about the patch it that it abstracts everything about 
the queue.  CompactSplitThread no longer has to maintain both the queue and the 
hashset as all of that is now handled by the PriorityCompactionQueue.  It now 
only has to put on and take off the regions and that is it.

PriorityCompactionQueue basically has 2 modes of operation.  The default mode 
is to always give the higher priority compaction requests precedence.  The only 
downside to this is it could lead to starvation of lower priority requests 
(although if there is more important work to be done shouldn't just be doing 
that?).  The second mode prevents the starvation by allowing a request to raise 
its priority after it has been in the queue for a specified amount of time.  
For example with a 10 second priority elevation time, a LOW priority request 
would be elevated to a NORMAL priority request after 10 seconds.  This 
parameter can be tuned with the hbase.regionserver.thread.priorityElevationTime 
configuration parameter (a value of -1 means it uses the first mode).

The patch is against the 0.20.4 tag and I've included two new files 
PriorityCompactionQueue.java and TestPriorityCompactionQueue.java.  Those two 
files are thew new compaction queue and a unit test for it.  In addition I made 
all the necessary changes to the CompactSplitThread and MemStoreFlusher.  

We've tested this patch in our environment with great results.  We were able to 
lower our parameters to the following with no client pauses:

hbase.hregion.memstore.block.multiplier => 2
hbase.hstore.blockingStoreFiles => 2
hbase.hstore.compactionThreshold => 4

> Compaction requests should be prioritized to prevent blocking
> -------------------------------------------------------------
>
>                 Key: HBASE-2646
>                 URL: https://issues.apache.org/jira/browse/HBASE-2646
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.20.4
>         Environment: ubuntu server 10; hbase 0.20.4; 4 machine cluster (each 
> machine is an 8 core xeon with 16 GB of ram and 6TB of storage); ~250 Million 
> rows;
>            Reporter: Jeff Whiting
>         Attachments: prioritycompactionqueue-0.20.4.patch
>
>
> While testing the write capacity of a 4 machine hbase cluster we were getting 
> long and frequent client pauses as we attempted to load the data.  Looking 
> into the problem we'd get a relatively large compaction queue and when a 
> region hit the "hbase.hstore.blockingStoreFiles" limit it would get block the 
> client and the compaction request would get put on the back of the queue 
> waiting for many other less important compactions.  The client is basically 
> stuck at that point until a compaction is done.  Prioritizing the compaction 
> requests and allowing the request that is blocking other actions go first 
> would help solve the problem.
> You can see the problem by looking at our log files:
> You'll first see an event such as a too many HLog which will put a lot of 
> requests on the compaction queue.
> {noformat}
> 2010-05-25 10:53:26,570 INFO org.apache.hadoop.hbase.regionserver.HLog: Too 
> many hlogs: logs=33, maxlogs=32; forcing flush of 22 regions(s): 
> responseCounts,RS_6eZzLtdwhGiTwHy,1274232223324, 
> responses,RS_0qhkL5rUmPCbx3K-1274213057242,1274513189592, 
> responses,RS_1ANYnTegjzVIsHW-12742177419
> 21,1274511001873, responses,RS_1HQ4UG5BdOlAyuE-1274216757425,1274726323747, 
> responses,RS_1Y7SbqSTsZrYe7a-1274328697838,1274478031930, 
> responses,RS_1ZH5TB5OdW4BVLm-1274216239894,1274538267659, 
> responses,RS_3BHc4KyoM3q72Yc-1274290546987,1274502062319, 
> responses,RS_3ra9BaBMAXFAvbK-127421457
> 9958,1274381552543, responses,RS_6SDrGNuyyLd3oR6-1274219941155,1274385453586, 
> responses,RS_8AGCEMWbI6mZuoQ-1274306857429,1274319602718, 
> responses,RS_8C8T9DN47uwTG1S-1274215381765,1274289112817, 
> responses,RS_8J5wmdmKmJXzK6g-1274299593861,1274494738952, 
> responses,RS_8e5Sz0HeFPAdb6c-1274288
> 641459,1274495868557, 
> responses,RS_8rjcnmBXPKzI896-1274306981684,1274403047940, 
> responses,RS_9FS3VedcyrF0KX2-1274245971331,1274754745013, 
> responses,RS_9oZgPtxO31npv3C-1274214027769,1274396489756, 
> responses,RS_a3FdO2jhqWuy37C-1274209228660,1274399508186, 
> responses,RS_a3LJVxwTj29MHVa-12742
> {noformat}
> Then you see the too many log files:
> {noformat}
> 2010-05-25 10:53:31,364 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested 
> for region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862/783020138 
> because: regionserver/192.168.0.81:60020.cacheFlusher
> 2010-05-25 10:53:32,364 WARN 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862 has too many 
> store files, putting it back at the end of the flush queue.
> {noformat}
> Which leads to this: 
> {noformat}
> 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 60 on 60020' on region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore 
> size 128.0m is >= than blocking 128.0m size
> 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 84 on 60020' on region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore 
> size 128.0m is >= than blocking 128.0m size
> 2010-05-25 10:53:27,065 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 1 on 60020' on region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore 
> size 128.0m is >= than blocking 128.0m size
> {noformat}
> Once the compaction / split is done a flush is able to happen which unblocks 
> the IPC allowing writes to continue.  Unfortunately this process can take 
> upwards of 15+ minutes (the specific case shown here from our logs took about 
> 4 minutes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to