[jira] [Commented] (HBASE-11861) Native MOB Compaction mechanisms.

Jonathan Hsieh (JIRA) Fri, 05 Dec 2014 10:41:48 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-11861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235867#comment-14235867
 ]


Jonathan Hsieh commented on HBASE-11861:
----------------------------------------

thanks for doign the writeup.  High level I think need to define some 
invariants before we go into all the rules and procedures.

here are some thoughts and questions:

----

Overview:
0) when we do a mob compaction, are we compacting all mobs or just mobs 
relevant to a particular region?  

1) I don't think mob compaction has to happen after major compactions.  It 
could have its own schedule and could run less frequently than the normal major 
compactions.  Doing them after a major compaction (or after a few) is 
reasonable first cut.

2) cells deleted in minor compaction are ttl related?

3) why should hfile link's be rewritten?  I think we can use the same critieria 
to decide on if we do a mob compaction on it.

how to find candidate:

4) I don't think we want to scan all the mob files to do a compaction on a 
single store.  Also, because of splits and merges, there could be other del mob 
files that are relevant that have a start key earlier or later that cover the 
range in a particular store. I think we'll have to do some start key and end 
key tracking in the delmob files and the mob files to reduce the candidate list.

How to find invalid mob files:

5) why do a mini del file compaction?  why not just use it as is?

6) deletedCellsSizeInOneMobFile -- interesting.  I was thinking just a count of 
mobs associated with each mob file.

How to find the small file?

7) on merge -- shouldn't we try to guarantee time order in a merge so that the 
ttl cleaner is still effective?

how to handle split?

8) I'm not clear about the splits case here.  Also does it manage merges?  (say 
we have a single del file with deletes in rows a b c d.  that region gets split 
into a b and c d, and then again into separate a, b, c, and d regions.  finally 
someone does a merge for b and c to create a bc region.  Does the grouping on 
hash idea break then?  

I think we need to either track both the start and end keys in the del files 
and likely the mobfiles.  An alternative is somethign that splits mob flies and 
del files but that potentially causes write amplificaiton we want to avoid.
----

My gut feeling is that we need to deal with all mob files, iterate through 
ranges, and use mob counts.  We'd track start/end keys and counts in each mob 
file and each del file.  We could then iterate on mob files, and select nonly 
the del files that are relevant based on the start keys and end keys. We might 
want to track a histogram (count or size) of mob files deletions for  
particular mob file in each del file.   

> Native MOB Compaction mechanisms.
> ---------------------------------
>
>                 Key: HBASE-11861
>                 URL: https://issues.apache.org/jira/browse/HBASE-11861
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hsieh
>         Attachments: 141030-mob-compaction.pdf, mob compaction.pdf
>
>
> Currently, the first cut of mob will have external processes to age off old 
> mob data (the ttl cleaner), and to compact away deleted or over written data 
> (the sweep tool).  
> From an operational point of view, having two external tools, especially one 
> that relies on MapReduce is undesirable.  In this issue we'll tackle 
> integrating these into hbase without requiring external processes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11861) Native MOB Compaction mechanisms.

Reply via email to