[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15129366#comment-15129366
 ] 

Clara Xiong commented on HBASE-15181:
-------------------------------------

Thank you for your input. I will update the spec. 

We have date tiering like the spotify blog but with  tweaks. The overall theme 
is tiered store files based on the max time stamp for their data. If new data 
comes in perfect order, there is only one file per window except the incoming 
window or when they move across tiers. If there is out-of-order  data/late 
arrival/bulk loaded data, they will fall into older windows for compaction 
selection and may trigger compactions. We allow user to plug in a compaction 
policy for this case, default at exploring compaction, to reduce compaction 
storms. Other policy can be plugged in if user want to keep file count small by 
paying the price of re-compaction.

We use max time stamp instead of min as in the spotify blog to reduce 
performance penalty for out-of-order data, assuming no timestamp will be set to 
future time.

TTL works out of box by skipping the whole files.

Major compaction is disabled except pushed manually.

> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to