[ 
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243952#comment-15243952
 ] 

Duo Zhang commented on HBASE-15454:
-----------------------------------

{quote}
Can we separate the JIRA issues and patches for pluggable window schedules vs 
"archival" compaction?
{quote}
I‘m OK with it.

{quote}
I think the archival time boundary should be a separate configuration from the 
exponential window schedule's max tier age.
{quote}
I think the max age config is overloaded in the current implementation. The max 
tier age should be a config for generating the exponential window, and should 
not be used in DateTieredCompaction to filter old store files. We could 
introduce a new config that explicitly says it is a boundary that no minor 
compaction before it.

Yes, if you want to use archive then you should make sure that no old cell will 
be written if the window which the cell belongs to is archived. The can 
increase the max age config to delay archive. In our scenario, we are going to 
set this value to half a year, it is enough. And we could build some external 
tools to check if there is data skew and fix it manually.

{quote}
If going in this direction, I wonder if it's better to go all the way, from 
having every minor compaction output perfectly partitioned HFiles to even doing 
so at flush time as well.
{quote}
I'm not sure... Need a benchmark I think. For stripe compaction, there is a 
config that controls whether we should first flush data to L0 without split 
them or flush to each stripe directly.

Thanks.

> Archive store files older than max age
> --------------------------------------
>
>                 Key: HBASE-15454
>                 URL: https://issues.apache.org/jira/browse/HBASE-15454
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>    Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
>         Attachments: HBASE-15454-v1.patch, HBASE-15454-v2.patch, 
> HBASE-15454-v3.patch, HBASE-15454-v4.patch, HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive 
> it to several big files(by year or something) and use EC to reduce the 
> redundancy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to