[jira] [Commented] (HBASE-7667) Support stripe compaction

Matt Corgan (JIRA) Sat, 09 Feb 2013 17:37:15 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575309#comment-13575309
 ]


Matt Corgan commented on HBASE-7667:
------------------------------------

{quote}So you are in favor of Sergey's project?{quote}oh yes, if you could not 
tell.  Thinking of an HBASE-7667 tatoo =)

One of the few major things hbase is missing in my opinion is the ability to 
load time-series through the normal api, rather than having to go off and write 
some separate bulk load code.  HBase currently takes a dump when you do that.  
Main culprits are HBASE-5479 and my comment in HBASE-3484 
(https://issues.apache.org/jira/browse/HBASE-3484?focusedCommentId=13410934&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13410934).
  Even during normal operation as opposed to a one-off import of data, the 
inefficiencies are still happening, just at a less obvious pace.

It may be a follow on to this jira, but having "striper" dynamically add 
stripes at the end of the region would let allow all the stripes before the 
last one "go cold" which is critical for avoiding hugely wasteful compactions 
of non-changing data.  Ideally, it would be able to allocate small stripes as 
new data comes in (each flush?) and then later go on to merge older stripes to 
reduce hfile count (at major compaction time?).  With this in place on an N 
node cluster, you could partition your data with N or 2N regions using a hash 
prefix and basically let the regions grow infinitely large.  Currently I have 
to limit region size to ~2GB which results in hundreds of regions per node 
which is a bit of a management hassle because it's beyond human readable, and a 
bit wasteful with all the empty memstores among other things.

I do wonder if there's a more accurate name than stripe.  Stripes make me think 
of RAID stripes which is a different concept than sub-regions.  Sub-region is 
not a good name either though.

It would be cool if you could set a column family attribute like 
layout=TIME_SERIES which HBase could use to automatically pick the compaction 
strategy, split-point strategy, balancer strategy, and allow future niceties 
like using stronger compression on old data.
                
> Support stripe compaction
> -------------------------
>
>                 Key: HBASE-7667
>                 URL: https://issues.apache.org/jira/browse/HBASE-7667
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> So I was thinking about having many regions as the way to make compactions 
> more manageable, and writing the level db doc about how level db range 
> overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
> Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
> factor.
> And I suggest the following idea, let's call it stripe compactions. It's a 
> mix between level db ideas and having many small regions.
> It allows us to have a subset of benefits of many regions (wrt reads and 
> compactions) without many of the drawbacks (managing and current 
> memstore/etc. limitation).
> It also doesn't break seqNum-based file sorting for any one key.
> It works like this.
> The region key space is separated into configurable number of fixed-boundary 
> stripes (determined the first time we stripe the data, see below).
> All the data from memstores is written to normal files with all keys present 
> (not striped), similar to L0 in LevelDb, or current files.
> Compaction policy does 3 types of compactions.
> First is L0 compaction, which takes all L0 files and breaks them down by 
> stripe. It may be optimized by adding more small files from different 
> stripes, but the main logical outcome is that there are no more L0 files and 
> all data is striped.
> Second is exactly similar to current compaction, but compacting one single 
> stripe. In future, nothing prevents us from applying compaction rules and 
> compacting part of the stripe (e.g. similar to current policy with rations 
> and stuff, tiers, whatever), but for the first cut I'd argue let it "major 
> compact" the entire stripe. Or just have the ratio and no more complexity.
> Finally, the third addresses the concern of the fixed boundaries causing 
> stripes to be very unbalanced.
> It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
> results out with different boundaries.
> There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
> will be smaller but rebalancing will take ridiculous amount of I/O.
> If we take many stripes we are essentially getting into the 
> epic-major-compaction problem again. Some heuristics will have to be in place.
> In general, if, before stripes are determined, we initially let L0 grow 
> before determining the stripes, we will get better boundaries.
> Also, unless unbalancing is really large we don't need to rebalance really.
> Obviously this scheme (as well as level) is not applicable for all scenarios, 
> e.g. if timestamp is your key it completely falls apart.
> The end result:
> - many small compactions that can be spread out in time.
> - reads still read from a small number of files (one stripe + L0).
> - region splits become marvelously simple (if we could move files between 
> regions, no references would be needed).
> Main advantage over Level (for HBase) is that default store can still open 
> the files and get correct results - there are no range overlap shenanigans.
> It also needs no metadata, although we may record some for convenience.
> It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7667) Support stripe compaction

Reply via email to