[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Matt Corgan (Commented) (JIRA) Mon, 27 Feb 2012 11:19:11 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217410#comment-13217410
 ]


Matt Corgan commented on HBASE-5479:
------------------------------------

{quote}you need to do a bulk import MR (vs Put-based) or you have your 
compaction algorithm tuned incorrectly... you probably want to switch your 
compaction ratio to 0.125 and play with it from there{quote}
yeah, just using it as an opportunity to push HBase with real data to see what 
breaks first.  i hesitate to change the global compaction ratio when it's just 
a couple out of ~20 tables

Agree pluggable compaction strategies would be great, as would many other 
per-CF settings.  Making them pluggable would be far more useful than 
perfecting a general algorithm.

Is there a quick fix that could deal with outdated requests?  Like ignoring a 
CompactionRequest if the files in its CompactionSelection are not all there.  
Or when pulling a CompactionRequest from the head of the queue, iterate the 
entire queue to check if there's a newer CompactionRequest for the same Store.
                
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, 
> meaning a CompactionRequest may execute hours after it was created.  The 
> CompactionRequest holds a CompactionSelection that was selected at request 
> time but may no longer be the optimal selection.  The CompactionSelection 
> should be created at compaction execution time rather than compaction request 
> time.
> The current mechanism breaks down during high volume insertion.  The 
> inefficiency is clearest when the inserts are finished.  Inserting for 5 
> hours may build up 50 storefiles and a 40 element compaction queue.  When 
> finished inserting, you would prefer that the next compaction merges all 50 
> files (or some large subset), but the current system will churn through each 
> of the 40 compaction requests, the first of which may be hours old.  This 
> ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series 
> data where the data in the storefiles has minimal overlap.  With time series 
> data, there is even less benefit to intermediate merges because most 
> storefiles can be eliminated based on their key range during a read, even 
> without bloomfilters.  The only goal should be to reduce file count, not to 
> minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to 
> be looked at.  You would want to avoid having the same Store in the queue 
> multiple times.  And you would want the completion of one compaction to 
> possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have 
> each Store (all open in memory) keep a compactionPriority score up to date 
> after events like flushes, compactions, schema changes, etc.  Then you create 
> a "CompactionPriorityComparator implements Comparator<Store>" and stick all 
> the Stores into a PriorityQueue (synchronized remove/add from the queue when 
> the value changes).  The async compaction threads would keep pulling off the 
> head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Reply via email to