[
https://issues.apache.org/jira/browse/HBASE-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617776#comment-13617776
]
Sergey Shelukhin commented on HBASE-7967:
-----------------------------------------
bq. 1. Inside one stripe, can we reuse some logic in the default compaction
policy? The logic should be similar, right? There are many new configuration
parameters, can we re-use some from the default policy, such as max files, min
files, etc? Especially, they can be tuned per table/column family.
If you look at the patch in HBASE-7680, we do that. I wanted to keep parameters
separate as they might be different, but yeah it probably makes sense to reuse
them.
HBASE-7571 allows per-table/per-cf setting, example (from code; shell also
supports this):
{code}
htd.setConfiguration(StoreEngine.STORE_ENGINE_CLASS_KEY,
StripeStoreEngine.class.getName());
htd.setConfiguration(StripeStoreConfig.CountBased.FIXED_COUNT_KEY,
stripeCount.toString());
htd.setConfiguration(HStore.BLOCKING_STOREFILES_KEY, Long.toString(7 *
stripeCount));
if (l0FileCount != null) {
htd.setConfiguration(StripeStoreConfig.MIN_FILES_L0_KEY,
l0FileCount.toString());
}
if (assumeOrdering != null) {
htd.setConfiguration(StripeStoreConfig.ASSUME_ORDERING_KEY,
assumeOrdering.toString());
}
{code}
bq. 2. There is a configuration assumeOrdering. When should it be used?
This is related to dropping deletes. There's a recently discussed window in
HBase where you can make out of order Put before/during major compaction, and
it will not be visible before major compaction, but become visible after it
finishes and drops delete markers.
This setting will extends this window up to N memstore flushes instead of 1,
where N is number of L0 files (each a memstore flush); by not considering out
of order puts for L0 files in most compactions.
As a benefit, you don't need to make bigger compactions just to drop deletes.
So if you don't use out of order puts or are ok with existing window, you
should use it.
bq. 3. Will we support any stripe type other than count based/size based? If
so, probably we need to change how stripe type is configured, since it seems
that we can support only two types now .
Maybe. Hybrid "size+count" based stack mentioned would probably be just
improvement of count, if implemented.
Do you think it's worth changing now?
bq. 4. For count based, do we have to always have that many stripes? Is it ok
to have a size limit or something so that we don't have many small stripes?
As a future improvement it is possible, will add to doc.
bq. 5. Based on the performance test you did, the write performance is not
better. You mentioned it could be because of write amplification. Do we have
some number to prove it? If we have more IO, should the read performance be
affected too?
Well, I have numbers for write amplification - in count scheme, there's at
least x2 write amplification :) I measured ~2.5 in my first test with bad
settings (not the one in the doc :)). After current test finishes I will post
the results.
bq. 6. Can we have some doc to walk through the algorithm you implemented for
the count/size based compaction policy? I was wondering how some L0 files end
up in a specific stripe, how each stripe is created and maintained. Some
flow-chart may be very helpful.
The doc attached to this JIRA describes all that. Doesn't have pictures though
:( Do you mean on top of that doc.
> implement compactor for stripe compactions
> ------------------------------------------
>
> Key: HBASE-7967
> URL: https://issues.apache.org/jira/browse/HBASE-7967
> Project: HBase
> Issue Type: Sub-task
> Components: Compaction
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HBASE-7967-v0.patch, HBASE-7967-v0-with-stuff.patch,
> HBASE-7967-v1.patch, HBASE-7967-v1-with-7679-7680.patch, HBASE-7967-v2.patch,
> HBASE-7967-v2-with-7679-7680.patch
>
>
> Compactor needs to be implemented. See details in parent and blocking jira.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira