[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

Enis Soztutar (JIRA) Tue, 23 Feb 2016 19:46:37 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15160139#comment-15160139
 ]


Enis Soztutar commented on HBASE-15181:
---------------------------------------

bq. But I want to add a tweak to this proposal on how to handle late-arriving 
data. I want to compact the out-of-order data with newer files other than older 
ones.
We are using the maxTs to select the tier, rather than minTs, so this is 
already true?  

bq. We have observed drastic IO reduction for the scans.
Great. This will be a nice addition to HBase.
Let me look at the patch. 

bq. I am wondering how we have maintained mvcc with Ratio-basedCompactionPolicy 
and its derived class ExploringCompactonPolicy when we allow filtering 
bulk-load and skip large files?
We are assigning (bulk load) a seqId to the bulk loaded files at the time of 
the bulk load. We execute a flush beforehand to make sure that the sequenceId 
that is assigned is not overlapping with the in-memory data's sequenceIds. 
We have a store-level read/write lock that coordinates bulk load files and file 
selection for compaction. Is this what you were asking for? 

> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0, 1.3.0, 0.98.19
>
>         Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

Reply via email to