On Fri, Jun 24, 2011 at 10:47 AM, Emmanuel Lécharny <[email protected]> wrote: > On 6/24/11 9:40 AM, Alex Karasulu wrote: >> >> On Thu, Jun 23, 2011 at 7:20 PM, Emmanuel Lecharny<[email protected]> >> wrote: >>> >>> Hi guys, >>> >>> as I was reviewing the Table interface, and the Index hierarchy, I had a >>> bit >>> of time to try to understand this part of the server I was not >>> comfortable >>> with. And I saw that each index is using two tables : a forward and a >>> reverse table. >>> >>> The forward table is obviously mandatory. It is used to retrieve the list >>> of >>> entry's ID from an AT value. Fine. But what about the reverse table ? >>> >>> Alex told me that it's useful when dealing with such filters : >>> (&(cn=A)(sn=B)). >>> >>> The optimization engine will get the index that provides the smallest set >>> of >>> candidate (say, CN here). We then grab from the CN forward table all the >>> entry's ID associated with the 'A' value. Let's call it the CN-IDs set. >>> >>> Now, if we were to grab all the entries from the MasterTable to compare >>> their SN attribute with the 'B' value, t would be quite costly. We use >>> the >>> SN's reverse table. For each entry's ID, we check if the associated value >>> in >>> the SN's reverse table exists or not. If it exists, we have a valid >>> candidate, otherwise, we simply discard the candidate. >>> >>> As we can see, we never fetch the entry unless we have a valid candidate >>> (of >>> course, if the SN and CN AT are indexed). >>> >>> However, I do think this reverse table is spurious. We can get the same >>> result by using the SN's forward table : for each entry's ID, check in >>> the >>> SVn's forward table if the set of values (SN-IDs) contains the ID we >>> pulled >>> from the CN-IDs set. >>> >>> We can even do an SN-IDs ^ CN-IDs ( '^'<=> intersection) to get the list >>> of >>> valid candidates. No need to do N fetches in the reverse table (where N >>> is >>> the number of ID in CN-IDs). >>> >>> I think we can remove those reverse table, and I expect that it can speed >>> up >>> the search a bit, but mainly speed up greatly any modification operation >>> (add, delete, modify) and even decrease the disk consumption. >> >> I think this is possible. It would be interesting to test these two >> different systems using different kinds of data. The number of >> duplicate keys will obviously impact the performance: it will be a >> trade off. >> >> To test an Evaluator (i.e. sn=foo) on a candidate entry say id 37, we >> need to perform a reverse lookup on the sn index. The btree key is 37 >> and we check if it contains a tuple with a value of 'foo'. If it does >> then the Evaluator returns true if not it returns false. Now since we >> know we're looking for tuple ('foo', 37) and we only have the forward >> table we can lookup the 'foo' entries, and the values from duplicate >> keys would be sorted. We would just need to check if 37 were in these >> values. >> >> The reverse index has no duplicate keys. The only way to get a >> duplicate key in the reverse index is if the same entry (i.e. 37) >> contained the same value ('foo') for the same (sn) attribute. And this >> we know is not possible. So the lookups against the reverse table will >> be faster. > > I was thinking about something a bit different : as soon as you have grabbed > the list of entry's ID from the first index, looking into the other indexes > will also return a list of Entry's ID. Checking if those IDs are valid > candidate can then be done in one shot : do the intersection of the two sets > (they are ordered, so it's a O(n) operation) and just get the matching > entries. > > Compared to the current processing (ie, accessing the reverse index for > *each* candidate), this will be way faster, IMO.
This is a VERY interesting idea. Maybe we should create a separate thread for this and drive deeper into it. You got something I think here. > Note that it's true for the AND connector, but even for the OR connector, we > don't have this kind of issue. Sorry don't understand connector? You probably mean the cursors? Regards, Alex
