[
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236457#comment-15236457
]
Duo Zhang commented on HBASE-15454:
-----------------------------------
[~stack] Your comment is another big story :)
In our current design, 'Get' request on archived old files could be eliminated
by bloom filter. And for 'Scan' request, it requires the user who construct the
'Scan' object to set timerange on the 'Scan'.
For 'Get', I think it is possible to do some optimizations which make us touch
the archived file as less as possible, such as read hot files first, if not
found then open all files to read. But for 'Scan', I haven't find an easy way
to do this without user specified timerange, and unfortunately, our customer
uses 'Scan' much more than 'Get' :(
And it is a good idea to not exclude archived files when considering split. I
think it could be generalized as 'do not include fold files when computing size
for splitting'? File another issue on this?
Thanks.
> Archive store files older than max age
> --------------------------------------
>
> Key: HBASE-15454
> URL: https://issues.apache.org/jira/browse/HBASE-15454
> Project: HBase
> Issue Type: Sub-task
> Components: Compaction
> Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15454-v1.patch, HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive
> it to several big files(by year or something) and use EC to reduce the
> redundancy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)