[
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235958#comment-15235958
]
Dave Latham commented on HBASE-15454:
-------------------------------------
Clara's out for a few days.
I read over the design doc and the patch and am still trying to understand how
this all fits together, and if this is the best way to add a new set of logic.
The design doc mentions 3 goals:
1. Use EC (to reduce the redundancy of old data)
2. No overlapping store files (to facilitate easier comparisons of data)
3. Calendar boundaries between windows (presumably for analysis, comparison, or
operations on files to match expectations of people or other systems)
Some questions and thoughts:
* Does EC mean erasure coding? I don't see anything related in the patch, and
am not familiar with any existing support within HBase to take advantage of it
with HDFS. Does it actually relate to this work or are you just mentioning
your intention to use this feature and EC at the same time. I'd love to hear
your thoughts on how you'd do it if so. I also wonder if it's orthogonal and
should be mostly independent of compaction policies.
* I don’t see how this achieves no overlapping store files. Can you explain
that part?
* Treating the archive windows as separate concept feels unnecessary to me.
Can we instead allow the windowing algorithm to be pluggable? The current
implementation is exponential growing windows across tiers to some limit, but
others could be entirely calendar based or fixed size windows. Then to achieve
what’s here you could implement a custom one window schedule that starts with
exponential and then transitions to calendar based.
> Archive store files older than max age
> --------------------------------------
>
> Key: HBASE-15454
> URL: https://issues.apache.org/jira/browse/HBASE-15454
> Project: HBase
> Issue Type: Sub-task
> Components: Compaction
> Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15454-v1.patch, HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive
> it to several big files(by year or something) and use EC to reduce the
> redundancy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)