[ 
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-15454:
------------------------------
    Description: 
In date tiered compaction, the store files older than max age are never touched 
by minor compactions. Here we introduce a 'freeze window' operation, which does 
the follow things:

1. Find all store files that contains cells whose timestamp are in the give 
window.
2. Compaction all these files and output one file for each window that these 
files covered.

After the compaction, we will have only one in the give window, and all cells 
whose timestamp are in the give window are in the only file. And if you do not 
write new cells with an older timestamp in this window, the file will never be 
changed. This makes it easier to do erasure coding on the freezed file to 
reduce redundence. And also, it makes it possible to check consistency between 
master and peer cluster incrementally.

And why use the word 'freeze'?
Because there is already an 'HFileArchiver' class. I want to use a different 
word to prevent confusing.

  was:Sometimes the old data is rarely touched but we can not remove it. So 
archive it to several big files(by year or something) and use EC to reduce the 
redundancy.


> Archive store files older than max age
> --------------------------------------
>
>                 Key: HBASE-15454
>                 URL: https://issues.apache.org/jira/browse/HBASE-15454
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>    Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
>         Attachments: HBASE-15454-v1.patch, HBASE-15454-v2.patch, 
> HBASE-15454-v3.patch, HBASE-15454-v4.patch, HBASE-15454.patch
>
>
> In date tiered compaction, the store files older than max age are never 
> touched by minor compactions. Here we introduce a 'freeze window' operation, 
> which does the follow things:
> 1. Find all store files that contains cells whose timestamp are in the give 
> window.
> 2. Compaction all these files and output one file for each window that these 
> files covered.
> After the compaction, we will have only one in the give window, and all cells 
> whose timestamp are in the give window are in the only file. And if you do 
> not write new cells with an older timestamp in this window, the file will 
> never be changed. This makes it easier to do erasure coding on the freezed 
> file to reduce redundence. And also, it makes it possible to check 
> consistency between master and peer cluster incrementally.
> And why use the word 'freeze'?
> Because there is already an 'HFileArchiver' class. I want to use a different 
> word to prevent confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to