[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611844#comment-13611844
]
stack commented on HBASE-7667:
------------------------------
{code}
On attached doc, it is lovely.
Missing author, date, and JIRA pointer?
An interesting comment by LarsH recently was that maybe we should ship w /major
compactions off; most folks don't delete
Missing is one a pointer at least to how it currently works (could just point
at src file I'd say with its description of 'sigma' compactions) and a sentence
on whats wrong w/ it
or the problems it leads too when left run amok (you say it for major
compactions but even w/o major compactions enabled, an i/o tsunami can hit and
wipe us out
What does this mean "and old boundaries rarely, if ever,
moving."? Give doc an edit?
I think you need to say stripe == sub-range of the region key range. You
almost do. Just do it explicitly.
I see your extra justification for l0, the need to be able to bulk load. It is
kinda important that we continue to support that. Good one.
Later I suppose we could have a combination of count-based and size-based....
if an edge stripe is N time bigger than any other, add a new stripe?
I was wondering if you could make use of liang xie's bit of code for making
keys for the block cache where he chooses a byte sequence that falls between
the last key in the former block and the first in the next block but the key is
shorter than either..... but it doesn't make sense here I believe;
your boundaries have to be hard actual keys given inserts are always coming
in.... so nevermind this suggestion.
You write the stripe info to the storefile. I suppose it is up to the hosting
region whether or not it chooses to respect those boundaries. It
could ignore them and just respect the seqnum and we'd have the old-style
storefile handling, right? (Oh, I see you allow for this -- good)
Say in doc that you mean storefile metadata else it is ambiguous.
Thinking on L0 again, as has been discussed, we could have flushes skip L0 and
flush instead to stripes (one flush turns into N files, one per stripe)
but even if we had this optimization, it looks like we'd still want the L0
option if only for bulk loaded files or for files whose metadata makes
no sense to the current region context.
"• The aggregate range of files going in must be
contiguous..." Not sure I follow. Hmm... could do with ".... going into a
compaction"
"If the stripe boundaries are changed by compaction,
the entire stripes with old boundaries must be
replaced" ...What would bring this on?
And then how would old boundaries get redone? This one is a bit confusing.
Get key before is a PITA
Not sure I follow here: "This compaction is performed when
the number of L0 files
exceeds some threshold and produces the number of
files equivalent to the number
of stripes, with enforced existing boundaries."
I was going to suggest an optimization for later for the case that an L0 fits
fully inside a stripe, I was thinking you could just 'move' it into
its respective stripe... but I suppose you can't do that because you need to
write the metadata to put a file into a stripe...
Would it help naming files for the stripe they belong too? Would that help?
In other words do NOT write stripe data to the storefiles and just
let the region in memory figure which stripe a file belongs too. When we
write, we write with say a L0 suffix. When compacting we add S1, S2,
etc suffix for stripe1, etc. To figure what the boundaries of an S0 are, it'd
be something the region knew. On open of the store files, it could
use the start and end keys that are currently in the file metadata to figure
which stripe they fit in.
Would be a bit looser. Would allow moving a file between stripes with a rename
only.
The delete dropping section looks right. I like the major compaction along a
stripe only option.
"For empty ranges, empty files are created." Is this
necessary? Would be good to avoid doing this.
{code}
> Support stripe compaction
> -------------------------
>
> Key: HBASE-7667
> URL: https://issues.apache.org/jira/browse/HBASE-7667
> Project: HBase
> Issue Type: New Feature
> Components: Compaction
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: Stripe compactions.pdf
>
>
> So I was thinking about having many regions as the way to make compactions
> more manageable, and writing the level db doc about how level db range
> overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy,
> Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication
> factor.
> And I suggest the following idea, let's call it stripe compactions. It's a
> mix between level db ideas and having many small regions.
> It allows us to have a subset of benefits of many regions (wrt reads and
> compactions) without many of the drawbacks (managing and current
> memstore/etc. limitation).
> It also doesn't break seqNum-based file sorting for any one key.
> It works like this.
> The region key space is separated into configurable number of fixed-boundary
> stripes (determined the first time we stripe the data, see below).
> All the data from memstores is written to normal files with all keys present
> (not striped), similar to L0 in LevelDb, or current files.
> Compaction policy does 3 types of compactions.
> First is L0 compaction, which takes all L0 files and breaks them down by
> stripe. It may be optimized by adding more small files from different
> stripes, but the main logical outcome is that there are no more L0 files and
> all data is striped.
> Second is exactly similar to current compaction, but compacting one single
> stripe. In future, nothing prevents us from applying compaction rules and
> compacting part of the stripe (e.g. similar to current policy with rations
> and stuff, tiers, whatever), but for the first cut I'd argue let it "major
> compact" the entire stripe. Or just have the ratio and no more complexity.
> Finally, the third addresses the concern of the fixed boundaries causing
> stripes to be very unbalanced.
> It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the
> results out with different boundaries.
> There's a tradeoff here - if we always take 2 adjacent stripes, compactions
> will be smaller but rebalancing will take ridiculous amount of I/O.
> If we take many stripes we are essentially getting into the
> epic-major-compaction problem again. Some heuristics will have to be in place.
> In general, if, before stripes are determined, we initially let L0 grow
> before determining the stripes, we will get better boundaries.
> Also, unless unbalancing is really large we don't need to rebalance really.
> Obviously this scheme (as well as level) is not applicable for all scenarios,
> e.g. if timestamp is your key it completely falls apart.
> The end result:
> - many small compactions that can be spread out in time.
> - reads still read from a small number of files (one stripe + L0).
> - region splits become marvelously simple (if we could move files between
> regions, no references would be needed).
> Main advantage over Level (for HBase) is that default store can still open
> the files and get correct results - there are no range overlap shenanigans.
> It also needs no metadata, although we may record some for convenience.
> It also would appear to not cause as much I/O.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira