[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

stack (JIRA) Mon, 01 Feb 2016 17:47:08 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127426#comment-15127426
 ]


stack commented on HBASE-15181:
-------------------------------

Good stuff [~vrodionov] You too [~claraxiong]. Suggest that a sentence on bulk 
load concern make it into release notes.

Nice writeup... 

What is the 'Sliding Window Tiered Compaction'? I don't know what it is so I 
don't know why "...sliding windows may trigger compaction too frequently and 
cause some files to be re-compacted." Is it the "...fully mimic the behavior of 
STCS and compact SSTables with a relative age difference less than a constant 
factor. " from the cited spotiify document?

I see no engagement with stripe compactions in the write up. Were they 
considered at all (stripe purportedly does best when the data is timeseries 
shaped). Would be good to at least call out how this differs.

Suggest you give more direct credit to 
https://labs.spotify.com/2014/12/18/date-tiered-compaction/ . You do so for the 
image copied but I find I have to read the original to understand what is being 
proposed and it seems like a bunch of the notions and text comes from it.

bq. And at the time of recovery, we may need to bulk load data. 

Which recovery is this? And who is doing the bulk load?

Do major compactions run in the date tiered scheme?

As per [~vrodionov], need the 'date' qualifier on configuration names... 

So, we have date tiering like the spotify blog compacting in first tier if 
above configured threshold. For other tiers, we do default exploring 
compactions. If bulk load, it can ruin our tiering but we'll just drop it in 
the tier that has its oldest timestamp? Major compactions does all tiers but 
the newest? (And when dated tiered, should be easier to drop whole files if 
TTL?)

Let me look at the patch.










> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

Reply via email to