[
https://issues.apache.org/jira/browse/HBASE-11861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237612#comment-14237612
]
Jingcheng Du commented on HBASE-11861:
--------------------------------------
Thanks Jon [[email protected]] for the comments.
bq. 0) when we do a mob compaction, are we compacting all mobs or just mobs
relevant to a particular region?
Just relevant to a particular region.
bq. 1) I don't think mob compaction has to happen after major compactions. It
could have its own schedule and could run less frequently than the normal major
compactions. Doing them after a major compaction (or after a few) is reasonable
first cut.
I agree.
But it's better to compact the mob files in each region since we have to
synchronize the major compaction and mob compaction to avoid the race
condition. The way we do in the sweep tool is to use zookeeper. If we do the
mob compaction in region, we could do it in locks.
bq. 3) why should hfile link's be rewritten? I think we can use the same
critieria to decide on if we do a mob compaction on it.
Agree.
bq. 4) I don't think we want to scan all the mob files to do a compaction on a
single store. Also, because of splits and merges, there could be other del mob
files that are relevant that have a start key earlier or later that cover the
range in a particular store. I think we'll have to do some start key and end
key tracking in the delmob files and the mob files to reduce the candidate list.
I thought we could only list the mob file names from NN. But we only get the
md5 of a start key, not the exact start key.
You're right, if the regions are merged, we could not find the related mob
files at all only by the md5 of the start key.
Currently we have the start key and stop key in the metadata of hfiles. It
means we could not get them only by the file names, but need to open readers to
the files.
Do you have ideas on this to track the start and stop key besides reading the
metadata, to revise the pattern of a mob file name? Please advise. Thanks.
bq. 5) why do a mini del file compaction? why not just use it as is?
Is that possible there are too many delmob files? If not, we could directly
open scanners to these delmob files.
Jon, do you have comments for the way to map the file names to deleted cells?
bq. 6) deletedCellsSizeInOneMobFile – interesting. I was thinking just a count
of mobs associated with each mob file.
Count is a good idea. But currently we don't have the accurate count
information in the mob and del mob file. As you know we have threshold for the
mob, we could not know how many of them are mob cells, how many are not. That's
a problem, right?
bq. 7) on merge – shouldn't we try to guarantee time order in a merge so that
the ttl cleaner is still effective?
Right, we should guarantee time order in a merge, I missed that in the design.
bq. 8) I'm not clear about the splits case here. Also does it manage merges?
(say we have a single del file with deletes in rows a b c d. that region gets
split into a b and c d, and then again into separate a, b, c, and d regions.
finally someone does a merge for b and c to create a bc region. Does the
grouping on hash idea break then?
Sorry, I missed the merge case. In order to get the start/stop keys
information, we have to read the mob files instead of file names in each region
now.
The region split and merge case will be handled in mob compaction by regions.
For split, If the start key of a mob file is between the start and stop keys of
a region, this mob file is handled by this region. This mob file might cross
regions by checking the its stop key. If this mob file crosses regions, it will
create two/or more ref file for each daughter regions. Each of the ref file is
handled in the mob compaction of daughter regions.
For merges, the files are not across regions, we directly select the mob files
if they're qualified (small or invalid) owned by the current region.
in the mob compaction of a b, if a mob file file#1 is selected we need create
two ref files, one for a b named ref-ab-file#1, the other is for c d named
ref-cd-file#1 (If a mob file is not selected, we don't need to create them at
all). The ref file ref-ab-file#1 s handled in the mob compaction of a b to
generate a new mob file file#1ab, the ref-bc-file#1 is handled in the mob
compaction of c d to generate the mob file file#1cd.
After the region ab is split, if( and only if) the file file#1ab is selected in
the mob compaction of region a, the new ref files are created and handled by
region a and region b.
For merge, it's easier than the split, directly select the small or invalid mob
files whose start/stop keys are between the key range of the current region.
bq. I think we need to either track both the start and end keys in the del
files and likely the mobfiles. An alternative is somethign that splits mob
flies and del files but that potentially causes write amplificaiton we want to
avoid.
Agree, we should track the start and stop keys. Now we track them in the
metadata of mob files. Do we need to track them in the file name by directly
using the hex string of start/stop key instead of md5(startkey)? So we could
know the start/stop keys directly the file names whereas currently we have to
read the metadata of the mob files. Please advise. Thanks.
bq. My gut feeling is that we need to deal with all mob files, iterate through
ranges, and use mob counts. We'd track start/end keys and counts in each mob
file and each del file. We could then iterate on mob files, and select nonly
the del files that are relevant based on the start keys and end keys. We might
want to track a histogram (count or size) of mob files deletions for particular
mob file in each del file.
Currently we track the start/stop keys in the metadata of mob files. But it's
hard to track the counts in each mob file since we have threshold for the mob
cells.
In this design doc, the mob compaction is handled in each region, it means only
part of mob files (owned by the current region) could be handled each time.
Instead, we could also do the mob compaction globally (in one single place) for
all the mob files. But how to avoid the race condition between the major
compaction and mob compaction for this? Still use the zookeeper?
Since the major compaction and mob compaction are not frequent, and deletion is
rare in the mob cases, could we ignore the race condition directly? Please
advise. Thanks.
> Native MOB Compaction mechanisms.
> ---------------------------------
>
> Key: HBASE-11861
> URL: https://issues.apache.org/jira/browse/HBASE-11861
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver, Scanners
> Affects Versions: 2.0.0
> Reporter: Jonathan Hsieh
> Attachments: 141030-mob-compaction.pdf, mob compaction.pdf
>
>
> Currently, the first cut of mob will have external processes to age off old
> mob data (the ttl cleaner), and to compact away deleted or over written data
> (the sweep tool).
> From an operational point of view, having two external tools, especially one
> that relies on MapReduce is undesirable. In this issue we'll tackle
> integrating these into hbase without requiring external processes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)