[
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127426#comment-15127426
]
stack commented on HBASE-15181:
-------------------------------
Good stuff [~vrodionov] You too [~claraxiong]. Suggest that a sentence on bulk
load concern make it into release notes.
Nice writeup...
What is the 'Sliding Window Tiered Compaction'? I don't know what it is so I
don't know why "...sliding windows may trigger compaction too frequently and
cause some files to be re-compacted." Is it the "...fully mimic the behavior of
STCS and compact SSTables with a relative age difference less than a constant
factor. " from the cited spotiify document?
I see no engagement with stripe compactions in the write up. Were they
considered at all (stripe purportedly does best when the data is timeseries
shaped). Would be good to at least call out how this differs.
Suggest you give more direct credit to
https://labs.spotify.com/2014/12/18/date-tiered-compaction/ . You do so for the
image copied but I find I have to read the original to understand what is being
proposed and it seems like a bunch of the notions and text comes from it.
bq. And at the time of recovery, we may need to bulk load data.
Which recovery is this? And who is doing the bulk load?
Do major compactions run in the date tiered scheme?
As per [~vrodionov], need the 'date' qualifier on configuration names...
So, we have date tiering like the spotify blog compacting in first tier if
above configured threshold. For other tiers, we do default exploring
compactions. If bulk load, it can ruin our tiering but we'll just drop it in
the tier that has its oldest timestamp? Major compactions does all tiers but
the newest? (And when dated tiered, should be easier to drop whole files if
TTL?)
Let me look at the patch.
> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
> Issue Type: New Feature
> Components: Compaction
> Reporter: Clara Xiong
> Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent
> data.
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the
> right store file for time-range-scan and re-compacton with existing store
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or
> per-column-famly level by hbase shell.
> Design spec is at
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)