[
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127380#comment-15127380
]
Enis Soztutar commented on HBASE-15181:
---------------------------------------
HBASE-7763 is the jira that talks about why we need to select contiguous set of
files for compaction. The main idea is that if two puts happen with the same
timestamp, we are ordering them using the sequenceId so that the "latest" one
is returned always. This allows the user to override a previously set value for
example in some cases.
The problem with non-contiguous compactions is that, we do not keep the seqids
of cells forever. After some time, we remove per-cell seqIds and only keep 1
sequenceId per hfile. Thus if we end up with two different puts having
different seqIds in files, but with same timestamp, then allowing
non-contiguous compactions may break the ordering.
For example:
{code}
file1: seqId=10, row=foo, val=v1 ts = 100
file2: seqId=20, row=bar, val=v2, ts=200
file3: seqId=30, row=foo, val=v3, ts = 100
file4: seqId=40, row=bar, val=v4, ts=300
{code}
If I compact file1 and file4 together, then the new file will have <row=foo,
ts=100, val=v1, seqId=40>, although <row=foo, ts=100, val=v3, seqId=30> should
be the correct answer. The bad thing is that if you are doing a query for
reading the value of row=foo, the returned result will change based on whether
compaction is run or not.
What I was saying offline is that we can actually do something like HBASE-9905
and disallow client-settable timestamps, or do something like HBASE-10247
where the table pre-declares that we won't have same-ts edits, it should be
possible to do non-contigous compactions.
It seems that HBASE-3690 introduced a config option to exclude bulk loaded
files to be excluded from minor compaction. This is to prevent compaction
storms due to bulk load, but it is off by default and made configurable.
Somewhere down the line, the conf option got removed, but I was not able to
trace that. Maybe a bug?
Some more background is here:
https://issues.apache.org/jira/browse/HBASE-8770
https://issues.apache.org/jira/browse/HBASE-8721
> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
> Issue Type: New Feature
> Components: Compaction
> Reporter: Clara Xiong
> Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent
> data.
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the
> right store file for time-range-scan and re-compacton with existing store
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or
> per-column-famly level by hbase shell.
> Design spec is at
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)