[jira] [Commented] (HBASE-11861) Native MOB Compaction mechanisms.

Jonathan Hsieh (JIRA) Wed, 10 Dec 2014 09:35:52 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-11861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241422#comment-14241422
 ]


Jonathan Hsieh commented on HBASE-11861:
----------------------------------------

This is a good discussion -- I'll spend some time designing out alternatives in 
more detail today and come back with some alternatives to consider (single 
region or globlal, locations for delmobs and potential race conditions). I 
would like to note that there are some pieces that could be implemented while 
this is going on (e.g. the code for modifying the compaction scan to write del 
mob lists will be the same regardless of where different actions end up 
happening).  

I was initially thinking a global mob compaction, that would scan the set of 
del mob files once and only rewrite some regions.  The HM would keep a summary 
of the delmob files in memory and figure out which mobs need to rewritten,  and 
 for a first cut the HM would do the IO heavy portions since it is simpler and 
might be sufficient . If that isn't the case, then we'd have the hm farm 
compaction work out to make the process distributed using either the procedure 
(my vote since [~mbertozzi] is doing work over there) or distributed log 
splitting infrastructure to coordinate.

{quote}
But it's better to compact the mob files in each region since we have to 
synchronize the major compaction and mob compaction to avoid the race 
condition. The way we do in the sweep tool is to use zookeeper. If we do the 
mob compaction in region, we could do it in locks.
{quote}

With the del mob hfile approach, I'm not sure if there is a race condition 
between major compactions and mob compactions.  In the 141030 attachmentv 
illustrated on slide #23 , we use bulk loading of new mobs and new references, 
and then use hfile links (or something like them) when reading mobs so that we 
point to the original mobs or archived mobs (similar to snapshots).   This 
avoid the need for zk locks or and only uses the region locks already hardened 
in the bulk load process. .

{quote}
You're right, if the regions are merged, we could not find the related mob 
files at all only by the md5 of the start key.
Currently we have the start key and stop key in the metadata of hfiles. It 
means we could not get them only by the file names, but need to open readers to 
the files.
Do you have ideas on this to track the start and stop key besides reading the 
metadata, to revise the pattern of a mob file name? Please advise. Thanks.
{quote}

I think we'd need to revise the pattern on the del mob file name  -- it would 
likely need a tuple of (start key, end key, start key, # unique mobs), These 
cells would have pointers to the particular files so we could gather counts of 
how many cells are being deleted.  We might be able to get away with not 
changing the format / name of the mob files themselves.

{quote}
Is that possible there are too many delmob files? If not, we could directly 
open scanners to these delmob files.
Jon, do you have comments for the way to map the file names to deleted cells?
{quote}

This I don't really know -- let's do a back of the envelope.

We create a del mob file per region compaction (major, and potentially minor 
due to ttl age offs).  Worst case we delete exactly one mob per compaction.  
Assuming 1MB / mob, we might have to have 500 del mobs to meet a 50% threshold 
on  a 1GB mob file (and this is per region).  That is a lot of files.

So I agree, this sounds like this would be a potential problem. 

{quote}
Sorry, I missed the merge case. In order to get the start/stop keys 
information, we have to read the mob files instead of file names in each region 
now.
The region split and merge case will be handled in mob compaction by regions.
For split, If the start key of a mob file is between the start and stop keys of 
a region, this mob file is handled by this region. This mob file might cross 
regions by checking the its stop key. If this mob file crosses regions, it will 
create two/or more ref file for each daughter regions. Each of the ref file is 
handled in the mob compaction of daughter regions.
For merges, the files are not across regions, we directly select the mob files 
if they're qualified (small or invalid) owned by the current region.
in the mob compaction of a b, if a mob file file#1 is selected we need create 
two ref files, one for a b named ref-ab-file#1, the other is for c d named 
ref-cd-file#1 (If a mob file is not selected, we don't need to create them at 
all). The ref file ref-ab-file#1 s handled in the mob compaction of a b to 
generate a new mob file file#1ab, the ref-bc-file#1 is handled in the mob 
compaction of c d to generate the mob file file#1cd.
After the region ab is split, if( and only if) the file file#1ab is selected in 
the mob compaction of region a, the new ref files are created and handled by 
region a and region b.
For merge, it's easier than the split, directly select the small or invalid mob 
files whose start/stop keys are between the key range of the current region.
{quote}

I think we can have something simpler if we use a different approach.  We know 
these invariants:
* The del mobs have the names of the mob files.  
* Splits or merges do not affect the mob files at all.  (doing del mobs should 
decouple major compactions for mob compactions). 

If we do a scan on the del mobs instead of the mob files, we could get counts 
in specific mob files  and figure out which mob files to rewrite/compact with 
other mob files.  Using the reference bulk load mentioned early, we don't even 
have to worry about splits or merges of the normal regions.

This has me really leaning more and more towards a global delmob scan on the 
master to id mob hfiles to compact as opposed to a per region approach.

{quote}
Currently we track the start/stop keys in the metadata of mob files. But it's 
hard to track the counts in each mob file since we have threshold for the mob 
cells.
In this design doc, the mob compaction is handled in each region, it means only 
part of mob files (owned by the current region) could be handled each time.
Instead, we could also do the mob compaction globally (in one single place) for 
all the mob files. But how to avoid the race condition between the major 
compaction and mob compaction for this? Still use the zookeeper?
Since the major compaction and mob compaction are not frequent, and deletion is 
rare in the mob cases, could we ignore the race condition directly? Please 
advise. Thanks.
{quote}

I think the bulk load approach avoids the potential race on mob compaction and 
normal compaction.  There might be the case where a new delmob shows up while a 
mob compaction is happening but we'd just need to keep the list of del mobs we 
are reading when we do the del mob scan so that we don't accidentalkly remove 
new  del mobs a normal compaction would create while a mob compaction was 
happening.


> Native MOB Compaction mechanisms.
> ---------------------------------
>
>                 Key: HBASE-11861
>                 URL: https://issues.apache.org/jira/browse/HBASE-11861
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hsieh
>         Attachments: 141030-mob-compaction.pdf, mob compaction.pdf
>
>
> Currently, the first cut of mob will have external processes to age off old 
> mob data (the ttl cleaner), and to compact away deleted or over written data 
> (the sweep tool).  
> From an operational point of view, having two external tools, especially one 
> that relies on MapReduce is undesirable.  In this issue we'll tackle 
> integrating these into hbase without requiring external processes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11861) Native MOB Compaction mechanisms.

Reply via email to