[ https://issues.apache.org/jira/browse/HBASE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173181#comment-15173181 ]
Duo Zhang commented on HBASE-15339: ----------------------------------- For example, usually we consider hot data to be 'data that written in the last 7 days', not 'data that written after last Monday', that's why a moving window is more suitable to determine hot data. And for the archive logic, there is a max age config which we will skip compaction on files older than this. I think we could do archive on these files? Thanks. > Add archive tiers for date based tiered compaction > -------------------------------------------------- > > Key: HBASE-15339 > URL: https://issues.apache.org/jira/browse/HBASE-15339 > Project: HBase > Issue Type: Improvement > Components: Compaction > Reporter: Duo Zhang > > For our MiCloud service, the old data is rarely touched but we still need to > keep it, so we want to put the data on inexpensive device and reduce > redundancy using EC to cut down the cost. > With date based tiered compaction introduced in HBASE-15181, new data and old > data can be placed in different tier. But the tier boundary moves as time > lapse so it is still possible that we do compaction on old tier which breaks > our block moving and EC work. > So here we want to introduce an "archive tier" to better fit our scenario. > Add an configuration called "archive unit", for example, year. That means, if > we find that the tier boundary is already in the previous year, then we reset > the boundary to the start of year and end of year, and if we want to do > compaction in this tier, just compact all files into one file. The file will > never be changed unless we force a major compaction so it is safe to apply EC > and other cost reducing approach on the file. And we make more tiers before > this tier year by year. -- This message was sent by Atlassian JIRA (v6.3.4#6332)