[ 
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235958#comment-15235958
 ] 

Dave Latham commented on HBASE-15454:
-------------------------------------

Clara's out for a few days.

I read over the design doc and the patch and am still trying to understand how 
this all fits together, and if this is the best way to add a new set of logic.
The design doc mentions 3 goals:
1. Use EC (to reduce the redundancy of old data)
2. No overlapping store files (to facilitate easier comparisons of data)
3. Calendar boundaries between windows (presumably for analysis, comparison, or 
operations on files to match expectations of people or other systems)

Some questions and thoughts:
* Does EC mean erasure coding?  I don't see anything related in the patch, and 
am not familiar with any existing support within HBase to take advantage of it 
with HDFS.  Does it actually relate to this work or are you just mentioning 
your intention to use this feature and EC at the same time.  I'd love to hear 
your thoughts on how you'd do it if so.  I also wonder if it's orthogonal and 
should be mostly independent of compaction policies.
* I don’t see how this achieves no overlapping store files.  Can you explain 
that part?
* Treating the archive windows as separate concept feels unnecessary to me.   
Can we instead allow the windowing algorithm to be pluggable?  The current 
implementation is exponential growing windows across tiers to some limit, but 
others could be entirely calendar based or fixed size windows.  Then to achieve 
what’s here you could implement a custom one window schedule that starts with 
exponential and then transitions to calendar based.

> Archive store files older than max age
> --------------------------------------
>
>                 Key: HBASE-15454
>                 URL: https://issues.apache.org/jira/browse/HBASE-15454
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>    Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
>         Attachments: HBASE-15454-v1.patch, HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive 
> it to several big files(by year or something) and use EC to reduce the 
> redundancy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to