[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151531#comment-15151531
 ] 

Enis Soztutar commented on HBASE-15181:
---------------------------------------

[~claraxiong] this is great work BTW. Thanks for pushing for this. 

I just wanted to bring one open item back to jira to see whether ordering files 
with timestamps, rather than seqid, and doing non-contiguous is acceptable: 
{quote}
The tiered structure is built completely and solely on the data timestamp of 
the store files. We cannot sort by segId at all. Any logic for updates/deletes 
depending on seqId would break. The user needs to guarantee updates or deletes 
are in order aligned with time stamp order. This compaction policy is pluggable 
and this limitation will be lifted if the work to allow compaction out of order 
of seqId is done. As you pointed out in the ticket: "What I was saying offline 
is that we can actually do something like HBASE-9905 and disallow 
client-settable timestamps, or do something like HBASE-10247 where the table 
pre-declares that we won't have same-ts edits, it should be possible to do 
non-contigous compactions."
{quote}

Given that there is no hard-guarantees as of now about whether the client can 
do out of order timestamp writes, can we still always be correct, but if the 
client does an excessive amount of these writes, the compaction will not 
perform as efficiently. Basically, if we can, I would like a system where the 
client will get the full benefit automatically if the timestamps follow seqId 
order, but if not, the results are still correct. If there are occasional 
out-of-order writes, the performance is not that badly affected, if not, the 
compaction algorithm can behave badly. 

I think we can achieve this with something like this: 
 - Use max ts as in the design for store files. 
 - Instead of ordering files by decreasing ts, order files by decreasing seqId. 
 - Iterating from highest seqId to lowest, find the tier that the file belongs 
to using maxTs. The only difference from the current algorithm is that in the 
iteration, we should always assign tiers in increasing order t0, t1, t2. This 
means that if out of order data is present, and we end up with flushes where 
maxTs is very old, lets say it falls into t2, then t1 and t0 would be empty and 
all files will be t2+. Otherwise (if you do not have out of order writes, or 
have them occasionally) the behavior will be the same as in the design. 

Alternatively HFiles also have CREATE_TIME_TS, which is different than 
maxTimestamp. maxTS comes from the user data, while hfile create time is the 
system time at the time of hfile writing. If we do the tier selection based on 
hfile time instead of users maxTs, then we might not even have that problem. 
Again, if there is actual correlation of user's timestamps with the seqIds (or 
hfile create times), you would get all the benefits, otherwise, we would still 
return the correct results, but compaction may not be optimal (I think it will 
be like falling back to exploring one). Anyway, just a suggestion to consider. 
I might not have thought of all corner cases. 

You are saying that this patch is also in production. Are there any numbers 
you've collected? 


> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0, 1.3.0, 0.98.19
>
>         Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to