[
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236605#comment-15236605
]
Duo Zhang commented on HBASE-15454:
-----------------------------------
{quote}
it seems very inefficient that we need a routine to slice the data along the
exponential windows for minor/major compaction and another concurrent routine
to slice the data along the calendar windows to archive them.
{quote}
For major compaction, windows before 'max age' will be ArchiveWindow. The 'max
age' is a split point, newer windows are tiered and older windows are archived.
{quote}
A user should only need either layout, not both.
{quote}
That's not true. For example, you want to archive data by year, and today is
Jan 1, I think most of people do not want to archive the data written
yesterday, right? So there will be a 'max age' config which means we only
archive data older than it.
{quote}
And please add the EC manager code and make it work with both types of windows.
{quote}
The EC manager is part of HDFS...As I said, it is transparent to HBase
currently...And you can use it for any file you want, but it can not perform
well on small files.
Thanks.
> Archive store files older than max age
> --------------------------------------
>
> Key: HBASE-15454
> URL: https://issues.apache.org/jira/browse/HBASE-15454
> Project: HBase
> Issue Type: Sub-task
> Components: Compaction
> Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15454-v1.patch, HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive
> it to several big files(by year or something) and use EC to reduce the
> redundancy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)