[ 
https://issues.apache.org/jira/browse/HADOOP-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562709#action_12562709
 ] 

stack commented on HADOOP-2615:
-------------------------------

I made issue HADOOP-2712 to cover not-splitting under load.

Billy, in bigtable paper, I believe what we call a flush is a minor compaction 
in gwhogle-speak and a merging compaction is what they call compaction of a few 
store files interleaving whats in memcache.

.bq When doing a minor compaction on a few files I thank we should compact the 
newest mapfiles first leave the larger/older ones for when we have low updates 
to a region.

Why you think newer rather than older Billy?

.bq I still thank if we are going to have transactions speed close to bigtables 
we will need to add a limit on number of map files to compaction at one time.

I agree given the times to compact posted above.

By the way, I tried out my simple upper-bound patch that put a cap of 
2*compactionThreshold on number of files to compact at once.  Seems to work 
with messages like below showing from time to time:

{code}2008-01-25 20:44:38,330 DEBUG org.apache.hadoop.hbase.HStore: Count of 
files to compact in 2052803679/info is 8 which is > twice compaction threshold 
of 3. Compacting 6 only
{code}

FYI, regionserver runs compaction.  Master has no say at moment.





> Add max number of mapfiles to compact at one time giveing us a minor & major 
> compaction
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2615
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2615
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>            Priority: Minor
>             Fix For: 0.17.0
>
>         Attachments: flag-v2.patch, flag.patch, twice.patch
>
>
> Currently we do compaction on a region when the 
> hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time 
> simulator to doing a minor compaction in bigtable. This keep compaction's 
> form getting tied up in one region to long letting other regions get way to 
> many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will 
> eventuly include all mapfiles causeing a major compaction on that region. 
> Unlike big table this would leave the master out of the process and letting 
> the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the 
> newest mapfiles first leave the larger/older ones for when we have low 
> updates to a region.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to