[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

Clara Xiong (JIRA) Tue, 09 Feb 2016 11:55:47 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139645#comment-15139645
 ]


Clara Xiong commented on HBASE-15181:
-------------------------------------

Two updates:
1. Thanks to [~stack] and [~enis] I went through the StripeCompactor code to 
see whether I can leverage the code for multiple output to chop the files by 
tier boundaries. My estimate is it will take significant expansion to make it 
work with time boundaries and it would be less effort and cleaner if I create a 
new compactor.
2. Thanks to [~vrodionov] [~stack] and [~enis] After the discussion on the 
reason we need to compact contiguously, I realized we have a hole in the 
algorithm. I sort the files by client defined max time stamp, not sequence id. 
Although the algorithm still only select contiguous store files  , they are not 
contiguous on seq id. The new changes for seq id  will make it work. I can 
break this patch up to two patches: one for dynamic configuration per column 
family and the other for the pluggable DateTieredCompactionPolicy. 

Please let me know what you think.

> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

Reply via email to