On Fri, Jun 24, 2011 at 11:07 AM, Emmanuel Lécharny <[email protected]> wrote: > On 6/24/11 10:04 AM, Alex Karasulu wrote: >> >> On Fri, Jun 24, 2011 at 11:00 AM, Emmanuel Lécharny >> <[email protected]> wrote: >>> >>> On 6/24/11 9:51 AM, Alex Karasulu wrote: >>>>>> >>>>>> The reverse index has no duplicate keys. The only way to get a >>>>>> duplicate key in the reverse index is if the same entry (i.e. 37) >>>>>> contained the same value ('foo') for the same (sn) attribute. And this >>>>>> we know is not possible. So the lookups against the reverse table will >>>>>> be faster. >>>>> >>>>> I was thinking about something a bit different : as soon as you have >>>>> grabbed >>>>> the list of entry's ID from the first index, looking into the other >>>>> indexes >>>>> will also return a list of Entry's ID. Checking if those IDs are valid >>>>> candidate can then be done in one shot : do the intersection of the two >>>>> sets >>>>> (they are ordered, so it's a O(n) operation) and just get the matching >>>>> entries. >>>>> >>>>> Compared to the current processing (ie, accessing the reverse index for >>>>> *each* candidate), this will be way faster, IMO. >>>> >>>> This is a VERY interesting idea. Maybe we should create a separate >>>> thread for this and drive deeper into it. You got something I think >>>> here. >>>> >>> have a look at >>> >>> https://cwiki.apache.org/confluence/display/DIRxSRVx11/Index+and+IndexEntry, >>> where I added some paragraphs explaining this idea. We can comment on >>> this >>> page. >> >> Nice pictures - what did you use for that? Reading further ... > > YeD >> >> Also if you're doing this in a branch, hence we're not yet committed >> on the approach, can you please do this on a separate page so you >> don't alter the existing documentation? > > Right now, I just updated the existing doco with some extended explanation > on the existing code. I can move suggestions to another page, sure.
Nah you did good - I spoke too soon. Thanks, Alex
