[
https://issues.apache.org/jira/browse/HBASE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667813#comment-15667813
]
Anoop Sam John commented on HBASE-16981:
----------------------------------------
Just to make sure my thinking is correct.
Say we have daily frequency of MOB compaction and partition also day wise as of
now. Now we change it to be monthly. Now every day the MOB compaction happen
and on day one per region one file was made. So like that there are many files
for many region. Next day also compaction happens and as the partition is
monthly, it will consider yesterday's bigger file and all small files of
today. Again 3rd day yday's bigger compacted file and today's small files..
And so on.. The IO increase is much more and that increases every day till we
reach month end.. End of the month only one file per region. (?)
So if our aim is only less number of files, can we think of doing staged
compactions? (I dont know whether it is correct name) What am thinking is per
day (consider freq as day) compaction happens to single file. And this way
continue for one week. Each day handle that days files alone.. End of the
week, the second stage happens that is 7 days (6 days compacted files+ today's
files) files getting compacted to one. Like this way end of the month all
previous week's one one file and this week's file and then this is working as a
2nd stage and compact into single file for the month. Like that may be at year
end also.. Just crazy thinking/. No analysis wrt code and all done at all.. And
not sure abt the possibility /complexity.. Just throwing it here .. Just
wanted to reduce the IO amplification. Am I saying my mind correctly?
> Expand Mob Compaction Partition policy from daily to weekly, monthly and
> beyond
> -------------------------------------------------------------------------------
>
> Key: HBASE-16981
> URL: https://issues.apache.org/jira/browse/HBASE-16981
> Project: HBase
> Issue Type: New Feature
> Components: mob
> Affects Versions: 2.0.0
> Reporter: huaxiang sun
> Assignee: huaxiang sun
> Attachments: HBASE-16981.master.001.patch,
> HBASE-16981.master.002.patch,
> Supportingweeklyandmonthlymobcompactionpartitionpolicyinhbase.pdf
>
>
> Today the mob region holds all mob files for all regions. With daily
> partition mob compaction policy, after major mob compaction, there is still
> one file per region daily. Given there is 365 days in one year, at least 365
> files per region. Since HDFS has limitation for number of files under one
> folder, this is not going to scale if there are lots of regions. To reduce
> mob file number, we want to introduce other partition policies such as
> weekly, monthly to compact mob files within one week or month into one file.
> This jira is create to track this effort.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)