[ 
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236439#comment-15236439
 ] 

Duo Zhang commented on HBASE-15454:
-----------------------------------

{quote}
if it's orthogonal and should be mostly independent of compaction policies.
{quote}
Mostly independent, especially on the code. In the current design, we will do 
EC in background with a RaidManager, it is transparent to HBase. But EC has 
some requirements on the file. First, large files, as large as possible. 
Second, keep the file there as long as possible. So there does have something 
to do when implementing compaction policies if you want to support EC.
In our plan, we have 3 stages.
1. Just do EC without any changes on the HDFS architecture(we are still working 
on this now).
2. Deploy an HDFS with different storage types. For example, 12 disks, 3 of 
them are SSD and 9 of them are hard disks. Data is written to SSD first and 
will be moved to hard disks when it is archived. This is still transparent  to 
HBase.
3. Deploy another HDFS whose machines are designed for storing large files 
only(24 or more disks typically, low power cpu, small memory) and move archived 
file to that HDFS. I think this will require some modifications on HBase to 
support operating on multiple HDFSes?

{quote}
I don’t see how this achieves no overlapping store files. Can you explain that 
part?
{quote}
I find the first and last files that overlapping with current archive window, 
and then compact all files between them. These makes sure that all data belongs 
to this window are contained in the output file.

{quote}
Can we instead allow the windowing algorithm to be pluggable? 
{quote}
I think it is a bit hard to this. First the configuration will become more 
complicated. And for the implementation, different window implementation is a 
bit different. For example, TieredWindow must be iterated from new to old. But 
for ArchiveWindow, you can construct them at any place of the timeline. Do you 
have a more specific design of the pluggable window?

Thanks. [~davelatham]

> Archive store files older than max age
> --------------------------------------
>
>                 Key: HBASE-15454
>                 URL: https://issues.apache.org/jira/browse/HBASE-15454
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>    Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
>         Attachments: HBASE-15454-v1.patch, HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive 
> it to several big files(by year or something) and use EC to reduce the 
> redundancy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to