[ 
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239522#comment-15239522
 ] 

Dave Latham commented on HBASE-15454:
-------------------------------------

Please forgive me - I still don't understand, and I want to.  It would be 
really helpful for me if you can try to answer these questions:
{quote}
* What does it actually mean to "archive" a store file? Is there a definition, 
or set of properties or guarantees?
** Are archived files excluded from major compaction? Or minor compactions? Or 
from region split size calculation?
** Are archived files guaranteed to have no timestamp overlap with other 
HFiles? Or just other archived HFiles?
** Or does it just refer to any files with max timestamp older than maxAge?
{quote}

Without understanding how archived files are different from other HFiles I 
don't see why it needs separate logic beyond purely having a pluggable window 
factory (which is nice to see in the v2 patch).

{quote}
Currently we do have a config for store files that is no longer eligible for 
minor compaction, which is max age
{quote}
Yikes.  I thought max age was purely part of the exponential tiered windowing 
schedule, which stopped the growth of tiers past a certain point.  Under common 
write patterns those files would then never need minor compactions again, but 
if there were actually several files in such a window I wouldn't want to 
explicitly prevent compaction of them.



> Archive store files older than max age
> --------------------------------------
>
>                 Key: HBASE-15454
>                 URL: https://issues.apache.org/jira/browse/HBASE-15454
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>    Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
>         Attachments: HBASE-15454-v1.patch, HBASE-15454-v2.patch, 
> HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive 
> it to several big files(by year or something) and use EC to reduce the 
> redundancy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to