[ 
https://issues.apache.org/jira/browse/HBASE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667813#comment-15667813
 ] 

Anoop Sam John commented on HBASE-16981:
----------------------------------------

Just to make sure my thinking is correct.
Say we have daily frequency of MOB compaction and partition also day wise as of 
now.  Now we change it to be monthly.   Now every day the MOB compaction happen 
and on day one per region one file was made.  So like that there are many files 
for many region.  Next day also compaction happens and as the partition is 
monthly,  it will consider yesterday's bigger file and all small files of 
today.   Again 3rd day yday's bigger compacted file and today's small files.. 
And so on..   The IO increase is much more and that increases every day till we 
reach month end.. End of the month only one file per region. (?)
So if our aim is only less number of files, can we think of doing staged 
compactions? (I dont know whether it is correct name)  What am thinking is per 
day (consider freq as day) compaction happens to single file. And this way 
continue for one week.  Each day handle that days files alone.. End of the 
week, the second stage happens that is 7 days (6 days compacted files+ today's 
files)  files getting compacted to one.  Like this way end of the month all 
previous week's one one file and this week's file and then this is working as a 
2nd stage and compact into single file for the month.  Like that may be at year 
end also.. Just crazy thinking/. No analysis wrt code and all done at all.. And 
not sure abt the possibility /complexity.. Just throwing it here ..  Just 
wanted to reduce the IO amplification.  Am I saying my mind correctly?



> Expand Mob Compaction Partition policy from daily to weekly, monthly and 
> beyond
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-16981
>                 URL: https://issues.apache.org/jira/browse/HBASE-16981
>             Project: HBase
>          Issue Type: New Feature
>          Components: mob
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>         Attachments: HBASE-16981.master.001.patch, 
> HBASE-16981.master.002.patch, 
> Supportingweeklyandmonthlymobcompactionpartitionpolicyinhbase.pdf
>
>
> Today the mob region holds all mob files for all regions. With daily 
> partition mob compaction policy, after major mob compaction, there is still 
> one file per region daily. Given there is 365 days in one year, at least 365 
> files per region. Since HDFS has limitation for number of files under one 
> folder, this is not going to scale if there are lots of regions. To reduce 
> mob file number,  we want to introduce other partition policies such as 
> weekly, monthly to compact mob files within one week or month into one file. 
> This jira is create to track this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to