Re: HBase partition integration in trunks ?

Emmanuel Lecharny Tue, 16 Aug 2011 05:58:10 -0700

On 8/16/11 2:22 PM, Kiran Ayyagari wrote:

On Tue, Aug 16, 2011 at 1:23 PM, Emmanuel Lécharny<[email protected]>  wrote:

On 8/15/11 5:59 PM, Stefan Seelmann wrote:

Now I have to update the parts that are a bit special, let me explain:
In HBase partition I didn't use one-level and sub-level indices, but
use the RDN index table instead. I also extended the search engine in
that way that one-level and sub-level cursors get the search filter in
order to perform filtering within the store instead of returning all
candidates and evaluate them.

Some toughts about this one-level/sub-level index.


Using the Rdn index makes perfect sense : we have the Rdn ->  parent relation
plus the parent ->  children relation in this index, so there is no need to
have a one level index (all the children are already listed in the RDN index
for a specific entry). I'm a bit more concerned about the sub-level
processing : we have to recurse on all the children to get all the
candidates. That's fine, we can easily implement that (and you already did),
but what concerns me is that we don't have the count of all the entries, we
will have to compute them. This count is necessary in the search engine to
select the index we will use to walk the entries.

One solution would be to store two more elements in the ParentIdAndRdn data
structure : the number of children directly below the RDN, and the number of
children and descendant. That would probably solve the issue I'm mentioning.
Of course, that also means we wil have to update all the RDN hierarchy from
top to bottom (but affecting only the RDN part of the entry DN) each time we
add/move/delete an entry. Note that we already do that for the oneLevel and
Sublevel index.

Just to make a point:
I think, in the case of achieving SubLevel index evaluation with RDN
index it becomes a costly and complex operation
(recursive scanning and updating) where as with the current sublevel
index it takes O(1) to fetch all the sublevel children of
an entry.
Not sure if HBase has any features to solve in an efficient manner

If you think about what is done when we add an entry in the current codebase :

- add the entry in the MasterTable
- add the RDN/parent into the RdnIndex

- update the one-level index with the newly added entry reference,increasing the number of children for the parent- for each RDN in the parent, update the sub-level index with the newlyadded entry reference, increasing the number of children for the parent


If we compare what we would do if we remove the one-level/sub-level index :
- add the entry in the MasterTable
- add the RDN/parent into the RdnIndex

- for each RDN in the parent, update the rdn index with the newly addedentry reference, increasing the number of children for the parent


This is one less operation, one less index updated.

Also we don't have to recurse to get the number of children for aspecific entry when processing a SUB search, as we have this number inthe RdnAndParentAndCount data structure. (this will be an extendedstructure for what is currently stored in the Rdn index).


At least, this is what I understand we must do if we go this way...

Did I miss something ?


--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: HBase partition integration in trunks ?

Reply via email to