[
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15160139#comment-15160139
]
Enis Soztutar commented on HBASE-15181:
---------------------------------------
bq. But I want to add a tweak to this proposal on how to handle late-arriving
data. I want to compact the out-of-order data with newer files other than older
ones.
We are using the maxTs to select the tier, rather than minTs, so this is
already true?
bq. We have observed drastic IO reduction for the scans.
Great. This will be a nice addition to HBase.
Let me look at the patch.
bq. I am wondering how we have maintained mvcc with Ratio-basedCompactionPolicy
and its derived class ExploringCompactonPolicy when we allow filtering
bulk-load and skip large files?
We are assigning (bulk load) a seqId to the bulk loaded files at the time of
the bulk load. We execute a flush beforehand to make sure that the sequenceId
that is assigned is not overlapping with the in-memory data's sequenceIds.
We have a store-level read/write lock that coordinates bulk load files and file
selection for compaction. Is this what you were asking for?
> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
> Issue Type: New Feature
> Components: Compaction
> Reporter: Clara Xiong
> Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent
> data.
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the
> right store file for time-range-scan and re-compacton with existing store
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or
> per-column-famly level by hbase shell.
> Design spec is at
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)