Re: Clarification: Merging and getRowKeyAtOrBefore

Bryan Duxbury Thu, 26 Jun 2008 21:27:09 -0700

My replies to the second question inline. Feel free to ask follow ups.


-Bryan

On Jun 26, 2008, at 5:24 PM, Andra Adams wrote:

Hi,
I've been looking through the HBase code and I was wondering if Icould get some clarification on two points.
1. Why doesn't HRegion's static merge method check that the tworegions specified are adjacent?
As far as I can tell, HRegion's merge method is called from theMerge tool which gets its region names from command linearguments. As far as I can see, merging non-adjacent regions wouldbreak many of the assertions that HBase depends on, yet all callsto HRegion's merge method result in a merged region. So how comethe caller of the Merge tool is being trusted to ensure theadjacency of the regions it is specifying on the command line?( Although admittedly, the adjacency check could be quitecomputationally-expensive since it would involve a complete scan ofall regions in the "parent" META table (either .META. or -ROOT-) toensure that there are no regions in the "daughter" (either a usertable or .META.) table that have a start key between the end keyand start key of the regions being asked to merge).
2. Can I get an overview of the algorithm used to determine thebest candidate key in HStore's getRowKeyAtOrBefore (includingMemcache's internalGetRowKeyAtOrBefore, and HStore'srowAtOrBeforeFromMapFile)?
I'm having trouble figuring out why HStore's getFull method looksthrough the mc, snapshot and storefiles in reverse chronologicalorder (i.e. mc, then snapshot, then store files), while thegetRowKeyAtOrBefore looks through the storefiles, then the mc, thenthe snapshot (in apparently no chronological order...?). Why doesgetFull create a map of deletes (and older entries check this mapbefore inserting their values in the results map), whilegetRowAtOrBefore opts to remove entries from the results map if adelete is found at a later time?
Aside from the difference in style between getFull andgetRowAtOrBefore, I'm also wondering why the discovery of a deletedvalue sometimes removes that key from the candidateKeys map, andother times is simply ignored. (It could be that I'm missing someof the concepts behind the algorithm).

The idea of getRowKeyAtOrBefore is to discover the row that comesimmediately before or right upon the search row. This is usedexclusively when trying to locate which region a key resides in. Thereasoning behind this is a little tricky. Regions in HBase are keyedon their start row, which is inclusive. The end row is implied by thepresence of the next region. So, when you have an arbitrary key you'dlike to perform some operation on, you need to find the region whichcontains it, which you can only know by scanning past it.

getRowKeyAtOrBefore is a specific, internal-only RPC method that doesthis operation. In order to actually do the work, at the HStorelevel, we have to decide amongst the possible keys that presented bythe memcache (including the snapshot) and all of the store files. Theorder here is unimportant, because ultimately, we're going to have tolook at every one of those things unless we encounter a precisematch. Moreover, there could be deletes in any one of them, so wehave to carry the candidates along with us and apply the deleteswhere they are required. The reasoning here is that if a row iscompletely deleted, that is, all cells are suppressed by deletes,even if it matches precisely, we don't want to return it as acandidate key. Deletes are ignored when the don't apply to the datawe've already found, usually because there's a newer piece of datathan there is a delete (this is simply a memory optimization).

Likewise, getFull tries to find a whole row of information about akey at a time. We need to follow deletes around here for the samereason that we do it in regular get: we don't want to return deleteddata. We go in reverse chronological order here because that allowsthe most recent data to easily take precedence.


Thanks,
Andra

[EMAIL PROTECTED]

Re: Clarification: Merging and getRowKeyAtOrBefore

Reply via email to