An attempt at answering first question is inlined below.
Bryan Duxbury wrote:
My replies to the second question inline. Feel free to ask follow ups.
-Bryan
On Jun 26, 2008, at 5:24 PM, Andra Adams wrote:
Hi,
I've been looking through the HBase code and I was wondering if I
could get some clarification on two points.
1. Why doesn't HRegion's static merge method check that the two
regions specified are adjacent?
Originally, merge would only allow merging of adjacent regions but a
couple of months back, we had bugs that could manufacture regions with
overlapping keys. To fix damaged clusters, the merge tool was amended
to remove the adjacency check and refactored so it could merge overlaps
as well as adjacents (See unit tests for merge tool. IIRC, it includes
tests that merge adjacent and overlapping regions).
Yes, if an operator tries to merge non-adjacents, they'll do damage. We
should add back some smarts that guard against this (If you don't file
an issue, I will).
Thanks for reminding us of this hole,
St.Ack
As far as I can tell, HRegion's merge method is called from the Merge
tool which gets its region names from command line arguments. As far
as I can see, merging non-adjacent regions would break many of the
assertions that HBase depends on, yet all calls to HRegion's merge
method result in a merged region. So how come the caller of the
Merge tool is being trusted to ensure the adjacency of the regions it
is specifying on the command line? ( Although admittedly, the
adjacency check could be quite computationally-expensive since it
would involve a complete scan of all regions in the "parent" META
table (either .META. or -ROOT-) to ensure that there are no regions
in the "daughter" (either a user table or .META.) table that have a
start key between the end key and start key of the regions being
asked to merge).
2. Can I get an overview of the algorithm used to determine the best
candidate key in HStore's getRowKeyAtOrBefore (including Memcache's
internalGetRowKeyAtOrBefore, and HStore's rowAtOrBeforeFromMapFile)?
I'm having trouble figuring out why HStore's getFull method looks
through the mc, snapshot and storefiles in reverse chronological
order (i.e. mc, then snapshot, then store files), while the
getRowKeyAtOrBefore looks through the storefiles, then the mc, then
the snapshot (in apparently no chronological order...?). Why does
getFull create a map of deletes (and older entries check this map
before inserting their values in the results map), while
getRowAtOrBefore opts to remove entries from the results map if a
delete is found at a later time?
Aside from the difference in style between getFull and
getRowAtOrBefore, I'm also wondering why the discovery of a deleted
value sometimes removes that key from the candidateKeys map, and
other times is simply ignored. (It could be that I'm missing some of
the concepts behind the algorithm).
The idea of getRowKeyAtOrBefore is to discover the row that comes
immediately before or right upon the search row. This is used
exclusively when trying to locate which region a key resides in. The
reasoning behind this is a little tricky. Regions in HBase are keyed
on their start row, which is inclusive. The end row is implied by the
presence of the next region. So, when you have an arbitrary key you'd
like to perform some operation on, you need to find the region which
contains it, which you can only know by scanning past it.
getRowKeyAtOrBefore is a specific, internal-only RPC method that does
this operation. In order to actually do the work, at the HStore level,
we have to decide amongst the possible keys that presented by the
memcache (including the snapshot) and all of the store files. The
order here is unimportant, because ultimately, we're going to have to
look at every one of those things unless we encounter a precise match.
Moreover, there could be deletes in any one of them, so we have to
carry the candidates along with us and apply the deletes where they
are required. The reasoning here is that if a row is completely
deleted, that is, all cells are suppressed by deletes, even if it
matches precisely, we don't want to return it as a candidate key.
Deletes are ignored when the don't apply to the data we've already
found, usually because there's a newer piece of data than there is a
delete (this is simply a memory optimization).
Likewise, getFull tries to find a whole row of information about a key
at a time. We need to follow deletes around here for the same reason
that we do it in regular get: we don't want to return deleted data. We
go in reverse chronological order here because that allows the most
recent data to easily take precedence.
Thanks,
Andra
[EMAIL PROTECTED]