[ 
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238535#comment-15238535
 ] 

Duo Zhang commented on HBASE-15454:
-----------------------------------

Ideally, archive happens from old to new, and there will be only one file in 
each window. Major compaction is not needed(unless there is a split? maybe). 
And I do not have a strong reason to include/exclude the archived files from 
split size calculation. Could introduce a config maybe?

But in the actual implementation, we need to play with lots of bad cases. Under 
the data tiered, the evil thing is out of order data. So major compaction is 
still needed to solve the problem. For us, the cost is acceptable for 
triggering major compaction manually if there is a data skew since it is rarely 
happen if user wants to use date tiered compaction. I do not want to implement 
things before I know how to use it correctly...

And after a deep thought, what we need is actually a hot range that data will 
be touched frequently, and warm range that data may still be changed, and a 
cold range that data is rarely rarely changed. So maybe I do not need the 
exponential window algorithm. Let me prepare a new patch.

And I agree that we can move window specific configs into each window factory. 
And for the whole calendar logic, it is a bit hard to generalize it since the 
nature promotion number is vary for different unit. And for joda-time, it is 
included as part of jdk in java 8, and it is commonly used, and usually does 
not cause compatibility issues(think of guava...). So I think it is not a big 
deal to introduce a joda-time dependency for hbase-server?

Thanks.

> Archive store files older than max age
> --------------------------------------
>
>                 Key: HBASE-15454
>                 URL: https://issues.apache.org/jira/browse/HBASE-15454
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>    Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
>         Attachments: HBASE-15454-v1.patch, HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive 
> it to several big files(by year or something) and use EC to reduce the 
> redundancy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to