[ 
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243854#comment-15243854
 ] 

Dave Latham commented on HBASE-15454:
-------------------------------------

Thanks, Duo, I think I finally understand the intent here: For old enough 
windows you want to compact whatever is necessary to produce exactly one file 
for that window containing exactly the cells timestamped in that window.  This 
sounds reasonable if you can guarantee that zero new cells are being added to 
those windows.

Now that I understand, a few thoughts:
* Can we separate the JIRA issues and patches for pluggable window schedules vs 
"archival" compaction?
* I think the archival time boundary should be a separate configuration from 
the exponential window schedule's max tier age.
* I don't have good intuition for how such an archiving mechanism would effect 
write amplification in practice, or how it performs under edge cases (e.g. once 
in awhile another "old" cell shows up) or if it's likely to output several 
small HFiles when it runs for example.  Do you have any analysis, simulation, 
or arguments about how this will behave and perform?  It seems that using this 
makes stronger assumptions about the use case and write behavior.
* If going in this direction, I wonder if it's better to go all the way, from 
having every minor compaction output perfectly partitioned HFiles to even doing 
so at flush time as well.  Could certainly be done later.

Thanks for your patience, Duo.




> Archive store files older than max age
> --------------------------------------
>
>                 Key: HBASE-15454
>                 URL: https://issues.apache.org/jira/browse/HBASE-15454
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>    Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
>         Attachments: HBASE-15454-v1.patch, HBASE-15454-v2.patch, 
> HBASE-15454-v3.patch, HBASE-15454-v4.patch, HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive 
> it to several big files(by year or something) and use EC to reduce the 
> redundancy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to