[
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236439#comment-15236439
]
Duo Zhang commented on HBASE-15454:
-----------------------------------
{quote}
if it's orthogonal and should be mostly independent of compaction policies.
{quote}
Mostly independent, especially on the code. In the current design, we will do
EC in background with a RaidManager, it is transparent to HBase. But EC has
some requirements on the file. First, large files, as large as possible.
Second, keep the file there as long as possible. So there does have something
to do when implementing compaction policies if you want to support EC.
In our plan, we have 3 stages.
1. Just do EC without any changes on the HDFS architecture(we are still working
on this now).
2. Deploy an HDFS with different storage types. For example, 12 disks, 3 of
them are SSD and 9 of them are hard disks. Data is written to SSD first and
will be moved to hard disks when it is archived. This is still transparent to
HBase.
3. Deploy another HDFS whose machines are designed for storing large files
only(24 or more disks typically, low power cpu, small memory) and move archived
file to that HDFS. I think this will require some modifications on HBase to
support operating on multiple HDFSes?
{quote}
I don’t see how this achieves no overlapping store files. Can you explain that
part?
{quote}
I find the first and last files that overlapping with current archive window,
and then compact all files between them. These makes sure that all data belongs
to this window are contained in the output file.
{quote}
Can we instead allow the windowing algorithm to be pluggable?
{quote}
I think it is a bit hard to this. First the configuration will become more
complicated. And for the implementation, different window implementation is a
bit different. For example, TieredWindow must be iterated from new to old. But
for ArchiveWindow, you can construct them at any place of the timeline. Do you
have a more specific design of the pluggable window?
Thanks. [~davelatham]
> Archive store files older than max age
> --------------------------------------
>
> Key: HBASE-15454
> URL: https://issues.apache.org/jira/browse/HBASE-15454
> Project: HBase
> Issue Type: Sub-task
> Components: Compaction
> Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15454-v1.patch, HBASE-15454.patch
>
>
> Sometimes the old data is rarely touched but we can not remove it. So archive
> it to several big files(by year or something) and use EC to reduce the
> redundancy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)