[
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243854#comment-15243854
]
Dave Latham commented on HBASE-15454:
-------------------------------------
Thanks, Duo, I think I finally understand the intent here: For old enough
windows you want to compact whatever is necessary to produce exactly one file
for that window containing exactly the cells timestamped in that window. This
sounds reasonable if you can guarantee that zero new cells are being added to
those windows.
Now that I understand, a few thoughts:
* Can we separate the JIRA issues and patches for pluggable window schedules vs
"archival" compaction?
* I think the archival time boundary should be a separate configuration from
the exponential window schedule's max tier age.
* I don't have good intuition for how such an archiving mechanism would effect
write amplification in practice, or how it performs under edge cases (e.g. once
in awhile another "old" cell shows up) or if it's likely to output several
small HFiles when it runs for example. Do you have any analysis, simulation,
or arguments about how this will behave and perform? It seems that using this
makes stronger assumptions about the use case and write behavior.
* If going in this direction, I wonder if it's better to go all the way, from
having every minor compaction output perfectly partitioned HFiles to even doing
so at flush time as well. Could certainly be done later.
Thanks for your patience, Duo.
> Archive store files older than max age
> --------------------------------------
>
> Key: HBASE-15454
> URL: https://issues.apache.org/jira/browse/HBASE-15454
> Project: HBase
> Issue Type: Sub-task
> Components: Compaction
> Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15454-v1.patch, HBASE-15454-v2.patch,
> HBASE-15454-v3.patch, HBASE-15454-v4.patch, HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive
> it to several big files(by year or something) and use EC to reduce the
> redundancy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)