Yeah, map-index is complex but that's due to problems with cursors on secondary btrees in Elephant. I'd like to rewrite it but didn't find a good strategy before. See my notes below.

I have one idea I'll look into for the standard case of walking a simple range.

map-index is generic, you are welcome to override it with a postmodern specific version rather than stressing about making the cursor APIs work such that they make map-index work.


On Nov 29, 2007, at 1:40 PM, Alex Mizrahi wrote:

hello

i'm now trying to improve implementation of cursor in postmodern backend.
it would be nice to optimize it for usage patterns of map-index, but
map-index implementation frightens me: is there a reason for such
complexity?

for example, just to enumarate index with two values it does calls like
this:

(cursor-pfirst c) => 1
(cursor-pnext-dup c) => nil
(cursor-pset c 1)
(cursor-pnext-nodup c) => 2
(cursor-pnext-dup c) => nil
(cursor-pset c 2)
(cursor-pnext-nodup c) => nil

while in my opinion it would be enough to do this:

(cursor-pfirst c) => 1
(cursor-pnext c) => 2
(cursor-pnext c) => nil

What if there are more than two values = 2? This will only get the first one. In this case, you need to get all duplicates for each value.

The pset is used because pnext-dup = nil causes the cursor to have an invalid location (Elephant design problem) so we use pset to get back into the current duplicate set so we can do a pnext-no-dup on a valid cursor.

If your pnext-dup can return nil and still point at the last element of the duplicate set, then you could remove that call in your version of map-index.

even if cursor is aware of such behaviour and caches cursor-pset, it's still
a waste of CPU cycles to do 3 times more calls and demand cursor to do
dup/nodup checks.

Althoughthese calls are probably not a big deal because the pages are cached in memory and page fetch time swamps a few function calls so it's in the noise for overall performance. For really large DBs there may be issues keeping all the higher pages in memory to support pset (since pnext only fetches adjacent pages). I'll think about this a bit...

I think this is basically what happens since you don't need to do a pset when pnext-dup returns nil.

next, if we have query by value, i think optimal sequence would be:

(cursor-pset c value) => 1
(cursor-pnext-dup c) => 2
(cursor-pnext-dup c) => 3
(cursor-pnext-dup c) => nil
...

that is, if cursor has enough logic to enumerate duplicates (all values for
given key), why not use it?
however when cursor-pnext-dup returns NIL, that means there's no more values with such key, elephant does once more (cursor-pset c value), and tries (cursor-pnext-nodup c), and then uses lisp-compare<= on returned value.

this was actually a reason for error Henrik have reported.
db-postmodern's ordering of keys differs from lisp-compare ordering (and in some cases it's just random). normally that should break only range queries. but with aforementioned (overly-general?) map-index logic it breaks even
simple get-instances-by-value calls, which is not good.

so, am i missing something, or really it would be better to refactor
map-index?

if it would be calling only what's needed, but won't do cursor-pset "for generality" i can skip cursor-pset optimization -- i don't think anybody in real code working with cursors will repeatadly call cursor-pset without a
reason.

with best regards, Alex 'killer_storm' Mizrahi.



_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Reply via email to