[
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239522#comment-15239522
]
Dave Latham commented on HBASE-15454:
-------------------------------------
Please forgive me - I still don't understand, and I want to. It would be
really helpful for me if you can try to answer these questions:
{quote}
* What does it actually mean to "archive" a store file? Is there a definition,
or set of properties or guarantees?
** Are archived files excluded from major compaction? Or minor compactions? Or
from region split size calculation?
** Are archived files guaranteed to have no timestamp overlap with other
HFiles? Or just other archived HFiles?
** Or does it just refer to any files with max timestamp older than maxAge?
{quote}
Without understanding how archived files are different from other HFiles I
don't see why it needs separate logic beyond purely having a pluggable window
factory (which is nice to see in the v2 patch).
{quote}
Currently we do have a config for store files that is no longer eligible for
minor compaction, which is max age
{quote}
Yikes. I thought max age was purely part of the exponential tiered windowing
schedule, which stopped the growth of tiers past a certain point. Under common
write patterns those files would then never need minor compactions again, but
if there were actually several files in such a window I wouldn't want to
explicitly prevent compaction of them.
> Archive store files older than max age
> --------------------------------------
>
> Key: HBASE-15454
> URL: https://issues.apache.org/jira/browse/HBASE-15454
> Project: HBase
> Issue Type: Sub-task
> Components: Compaction
> Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15454-v1.patch, HBASE-15454-v2.patch,
> HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive
> it to several big files(by year or something) and use EC to reduce the
> redundancy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)