[
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240477#comment-15240477
]
Duo Zhang commented on HBASE-15454:
-----------------------------------
Oh now the archive is independent with window implementation. The new window
implementation is just used to split files by calendar boundaries.
Let me explain the 'archive' logic. What we want is that, only one file in the
given window, and all cells with a timestamp in that window are in this file,
and also the file does not contain any cells whose timestamp are not in that
window.
The most important thing for us is that, for two clusters combined with
replication, the archived files should be exactly the same on the two clusters,
which makes us easy to find inconsistencies. And also, we could skip the
archived time range when running consistency check if we have confirmed that
all the archived files are the same.
And whether to exclude it from major compaction or split size calculation, I
have no idea right now. In our deployment, we will disable automatic major
compaction, as said above, trigger it outside HBase if needed. And for split
size calculation, also as said above, may introduce a new config? I do not know
because this is not a problem in our scenario... We have pre-split, and it is
not a big cost to split manually about half a year...
Thanks.
> Archive store files older than max age
> --------------------------------------
>
> Key: HBASE-15454
> URL: https://issues.apache.org/jira/browse/HBASE-15454
> Project: HBase
> Issue Type: Sub-task
> Components: Compaction
> Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15454-v1.patch, HBASE-15454-v2.patch,
> HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive
> it to several big files(by year or something) and use EC to reduce the
> redundancy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)