[ 
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240477#comment-15240477
 ] 

Duo Zhang commented on HBASE-15454:
-----------------------------------

Oh now the archive is independent with window implementation. The new window 
implementation is just used to split files by calendar boundaries.

Let me explain the 'archive' logic. What we want is that, only one file in the 
given window, and all cells with a timestamp in that window are in this file, 
and also the file does not contain any cells whose timestamp are not in that 
window.

The most important thing for us is that, for two clusters combined with 
replication, the archived files should be exactly the same on the two clusters, 
which makes us easy to find inconsistencies. And also, we could skip the 
archived time range when running consistency check if we have confirmed that 
all the archived files are the same.

And whether to exclude it from major compaction or split size calculation, I 
have no idea right now. In our deployment, we will disable automatic major 
compaction, as said above, trigger it outside HBase if needed. And for split 
size calculation, also as said above, may introduce a new config? I do not know 
because this is not a problem in our scenario... We have pre-split, and it is 
not a big cost to split manually about half a year...

Thanks.



> Archive store files older than max age
> --------------------------------------
>
>                 Key: HBASE-15454
>                 URL: https://issues.apache.org/jira/browse/HBASE-15454
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>    Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
>         Attachments: HBASE-15454-v1.patch, HBASE-15454-v2.patch, 
> HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive 
> it to several big files(by year or something) and use EC to reduce the 
> redundancy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to