[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

Enis Soztutar (JIRA) Thu, 16 Mar 2017 13:55:01 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928887#comment-15928887
 ]


Enis Soztutar commented on HBASE-15181:
---------------------------------------

bq. To take good advantage of it, the queries would need to set the time range 
on the scan itself (as opposed to purely encoding time information directly in 
the row/column identifiers and relying on them for time bounding). I'm not sure 
if opentsdb does that.
Excellent point. However, even if the actual time is embedded in the row/column 
model, an engine like opentsdb might be able to take advantage of this 
compaction strategy. The idea is that there should be a time bound error (lets 
call if {{E}}, where it is assumed that all data belonging to time {{T1}} has 
to be persisted. Then for queries for time ranges {{T1}} to {{T2}}, the engine 
can also set the time range on the scan object using {{T1-E}} to {{T2+E}}. This 
will provide both correctness (since the engine will still do filtering on the 
incoming data using T1 and T2, but the hbase scan will ignore data not in the 
range. Ambari metrics server does something like this. 

> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0, 1.3.0, 0.98.18
>
>         Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

Reply via email to