[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127380#comment-15127380
 ] 

Enis Soztutar commented on HBASE-15181:
---------------------------------------

HBASE-7763 is the jira that talks about why we need to select contiguous set of 
files for compaction. The main idea is that if two puts happen with the same 
timestamp, we are ordering them using the sequenceId so that the "latest" one 
is returned always. This allows the user to override a previously set value for 
example in some cases. 
​
The problem with non-contiguous compactions is that, we do not keep the seqids 
of cells forever. After some time, we remove per-cell seqIds and only keep 1 
sequenceId per hfile. Thus if we end up with two different puts having 
different seqIds in files, but with same timestamp, then allowing 
non-contiguous compactions may break the ordering. 

For example: 
{code}
file1: seqId=10, row=foo, val=v1 ts = 100 
file2: seqId=20, row=bar, val=v2, ts=200
file3: seqId=30, row=foo, val=v3, ts = 100 
file4: seqId=40, row=bar, val=v4, ts=300
{code}
If I compact file1 and file4 together, then the new file will have <row=foo, 
ts=100, val=v1, seqId=40>, although <row=foo, ts=100, val=v3, seqId=30> should 
be the correct answer. The bad thing is that if you are doing a query for 
reading the value of row=foo, the returned result will change based on whether 
compaction is run or not. 

What I was saying offline is that we can actually do something like HBASE-9905 
and disallow client-settable timestamps, or do something like HBASE-10247​ 
where the table pre-declares that we won't have same-ts edits, it should be 
possible to do non-contigous compactions. 

​It seems that HBASE-3690 ​introduced a config option to exclude bulk loaded 
files to be excluded from minor compaction. This is to prevent compaction 
storms due to bulk load, but it is off by default and made configurable. 
Somewhere down the line, the conf option got removed, but I was not able to 
trace that. Maybe a bug? 

Some more background is here: 
https://issues.apache.org/jira/browse/HBASE-8770
https://issues.apache.org/jira/browse/HBASE-8721

> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to