[jira] Updated: (HBASE-2646) Compaction requests should be prioritized to prevent blocking

Jeff Whiting (JIRA) Mon, 04 Oct 2010 11:39:56 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jeff Whiting updated HBASE-2646:
--------------------------------

    Attachment: 2464-fix-race-condition-r1004349.txt

As per my discussion with Ted Yu, here is a patch that addresses the potential 
race condition with queue.remove(queuedRequest).  It does a sanity check to 
make sure that what I'm removing from regionsInQueue is what I expect otherwise 
I leave it on.  The patch views this.queue as the authoritative source on the 
which requests to run and will always return whatever was removed from the 
queue.  

> Compaction requests should be prioritized to prevent blocking
> -------------------------------------------------------------
>
>                 Key: HBASE-2646
>                 URL: https://issues.apache.org/jira/browse/HBASE-2646
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.20.4
>         Environment: ubuntu server 10; hbase 0.20.4; 4 machine cluster (each 
> machine is an 8 core xeon with 16 GB of ram and 6TB of storage); ~250 Million 
> rows;
>            Reporter: Jeff Whiting
>            Priority: Critical
>             Fix For: 0.20.7
>
>         Attachments: 2464-fix-race-condition-r1004349.txt, 2646-v2.txt, 
> 2646-v3.txt, prioritycompactionqueue-0.20.4.patch, PriorityQueue-r996664.patch
>
>
> While testing the write capacity of a 4 machine hbase cluster we were getting 
> long and frequent client pauses as we attempted to load the data.  Looking 
> into the problem we'd get a relatively large compaction queue and when a 
> region hit the "hbase.hstore.blockingStoreFiles" limit it would get block the 
> client and the compaction request would get put on the back of the queue 
> waiting for many other less important compactions.  The client is basically 
> stuck at that point until a compaction is done.  Prioritizing the compaction 
> requests and allowing the request that is blocking other actions go first 
> would help solve the problem.
> You can see the problem by looking at our log files:
> You'll first see an event such as a too many HLog which will put a lot of 
> requests on the compaction queue.
> {noformat}
> 2010-05-25 10:53:26,570 INFO org.apache.hadoop.hbase.regionserver.HLog: Too 
> many hlogs: logs=33, maxlogs=32; forcing flush of 22 regions(s): 
> responseCounts,RS_6eZzLtdwhGiTwHy,1274232223324, 
> responses,RS_0qhkL5rUmPCbx3K-1274213057242,1274513189592, 
> responses,RS_1ANYnTegjzVIsHW-12742177419
> 21,1274511001873, responses,RS_1HQ4UG5BdOlAyuE-1274216757425,1274726323747, 
> responses,RS_1Y7SbqSTsZrYe7a-1274328697838,1274478031930, 
> responses,RS_1ZH5TB5OdW4BVLm-1274216239894,1274538267659, 
> responses,RS_3BHc4KyoM3q72Yc-1274290546987,1274502062319, 
> responses,RS_3ra9BaBMAXFAvbK-127421457
> 9958,1274381552543, responses,RS_6SDrGNuyyLd3oR6-1274219941155,1274385453586, 
> responses,RS_8AGCEMWbI6mZuoQ-1274306857429,1274319602718, 
> responses,RS_8C8T9DN47uwTG1S-1274215381765,1274289112817, 
> responses,RS_8J5wmdmKmJXzK6g-1274299593861,1274494738952, 
> responses,RS_8e5Sz0HeFPAdb6c-1274288
> 641459,1274495868557, 
> responses,RS_8rjcnmBXPKzI896-1274306981684,1274403047940, 
> responses,RS_9FS3VedcyrF0KX2-1274245971331,1274754745013, 
> responses,RS_9oZgPtxO31npv3C-1274214027769,1274396489756, 
> responses,RS_a3FdO2jhqWuy37C-1274209228660,1274399508186, 
> responses,RS_a3LJVxwTj29MHVa-12742
> {noformat}
> Then you see the too many log files:
> {noformat}
> 2010-05-25 10:53:31,364 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested 
> for region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862/783020138 
> because: regionserver/192.168.0.81:60020.cacheFlusher
> 2010-05-25 10:53:32,364 WARN 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862 has too many 
> store files, putting it back at the end of the flush queue.
> {noformat}
> Which leads to this: 
> {noformat}
> 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 60 on 60020' on region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore 
> size 128.0m is >= than blocking 128.0m size
> 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 84 on 60020' on region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore 
> size 128.0m is >= than blocking 128.0m size
> 2010-05-25 10:53:27,065 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 1 on 60020' on region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore 
> size 128.0m is >= than blocking 128.0m size
> {noformat}
> Once the compaction / split is done a flush is able to happen which unblocks 
> the IPC allowing writes to continue.  Unfortunately this process can take 
> upwards of 15+ minutes (the specific case shown here from our logs took about 
> 4 minutes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2646) Compaction requests should be prioritized to prevent blocking

Reply via email to