[jira] Commented: (HBASE-2646) Compaction requests should be prioritized to prevent blocking

Jonathan Gray (JIRA) Tue, 01 Jun 2010 14:07:05 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874238#action_12874238
 ]


Jonathan Gray commented on HBASE-2646:
--------------------------------------

This looks good and is very similar to the design I was going with.  The 
patches up in HBASE-2375 don't do prioritized queues but that was part of the 
redesign that I am currently working on that grew out of that jira.  There is a 
new prioritized executor service that will be going in as part of the master 
rewrite in trunk that I was planning on using for this.

My implementation does not have a LOW priority, I'm not sure where that would 
make sense.

And for preventing starvation, I would agree with your question that this 
doesn't actually make sense, though a cool feature that might make sense in a 
different context.  Any request for a high priority compaction is assumed to be 
under the premise that it is the most important thing to happen.  The desired 
behavior would be to complete it before allowing any lower priority compactions 
to complete?  If a system is starved for resources, we would still prefer to 
allow high priority compactions to complete before allowing any low priority?  
With a poorly set elevation time you would end up with everything in one 
priority defeating the purpose.

This is too big of a chance for the 0.20 line but definitely something we 
should do in trunk.  Can we put this jira on hold for another week or two when 
I get back to working on compaction improvements?  I'd like to get some of the 
new master changes in before rebasing this on trunk.

Thanks for all the work, this looks great.

> Compaction requests should be prioritized to prevent blocking
> -------------------------------------------------------------
>
>                 Key: HBASE-2646
>                 URL: https://issues.apache.org/jira/browse/HBASE-2646
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.20.4
>         Environment: ubuntu server 10; hbase 0.20.4; 4 machine cluster (each 
> machine is an 8 core xeon with 16 GB of ram and 6TB of storage); ~250 Million 
> rows;
>            Reporter: Jeff Whiting
>         Attachments: prioritycompactionqueue-0.20.4.patch
>
>
> While testing the write capacity of a 4 machine hbase cluster we were getting 
> long and frequent client pauses as we attempted to load the data.  Looking 
> into the problem we'd get a relatively large compaction queue and when a 
> region hit the "hbase.hstore.blockingStoreFiles" limit it would get block the 
> client and the compaction request would get put on the back of the queue 
> waiting for many other less important compactions.  The client is basically 
> stuck at that point until a compaction is done.  Prioritizing the compaction 
> requests and allowing the request that is blocking other actions go first 
> would help solve the problem.
> You can see the problem by looking at our log files:
> You'll first see an event such as a too many HLog which will put a lot of 
> requests on the compaction queue.
> {noformat}
> 2010-05-25 10:53:26,570 INFO org.apache.hadoop.hbase.regionserver.HLog: Too 
> many hlogs: logs=33, maxlogs=32; forcing flush of 22 regions(s): 
> responseCounts,RS_6eZzLtdwhGiTwHy,1274232223324, 
> responses,RS_0qhkL5rUmPCbx3K-1274213057242,1274513189592, 
> responses,RS_1ANYnTegjzVIsHW-12742177419
> 21,1274511001873, responses,RS_1HQ4UG5BdOlAyuE-1274216757425,1274726323747, 
> responses,RS_1Y7SbqSTsZrYe7a-1274328697838,1274478031930, 
> responses,RS_1ZH5TB5OdW4BVLm-1274216239894,1274538267659, 
> responses,RS_3BHc4KyoM3q72Yc-1274290546987,1274502062319, 
> responses,RS_3ra9BaBMAXFAvbK-127421457
> 9958,1274381552543, responses,RS_6SDrGNuyyLd3oR6-1274219941155,1274385453586, 
> responses,RS_8AGCEMWbI6mZuoQ-1274306857429,1274319602718, 
> responses,RS_8C8T9DN47uwTG1S-1274215381765,1274289112817, 
> responses,RS_8J5wmdmKmJXzK6g-1274299593861,1274494738952, 
> responses,RS_8e5Sz0HeFPAdb6c-1274288
> 641459,1274495868557, 
> responses,RS_8rjcnmBXPKzI896-1274306981684,1274403047940, 
> responses,RS_9FS3VedcyrF0KX2-1274245971331,1274754745013, 
> responses,RS_9oZgPtxO31npv3C-1274214027769,1274396489756, 
> responses,RS_a3FdO2jhqWuy37C-1274209228660,1274399508186, 
> responses,RS_a3LJVxwTj29MHVa-12742
> {noformat}
> Then you see the too many log files:
> {noformat}
> 2010-05-25 10:53:31,364 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested 
> for region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862/783020138 
> because: regionserver/192.168.0.81:60020.cacheFlusher
> 2010-05-25 10:53:32,364 WARN 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862 has too many 
> store files, putting it back at the end of the flush queue.
> {noformat}
> Which leads to this: 
> {noformat}
> 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 60 on 60020' on region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore 
> size 128.0m is >= than blocking 128.0m size
> 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 84 on 60020' on region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore 
> size 128.0m is >= than blocking 128.0m size
> 2010-05-25 10:53:27,065 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 1 on 60020' on region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore 
> size 128.0m is >= than blocking 128.0m size
> {noformat}
> Once the compaction / split is done a flush is able to happen which unblocks 
> the IPC allowing writes to continue.  Unfortunately this process can take 
> upwards of 15+ minutes (the specific case shown here from our logs took about 
> 4 minutes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2646) Compaction requests should be prioritized to prevent blocking

Reply via email to