Re: [elephant-devel] map-index

Ian Eslick Fri, 30 Nov 2007 07:54:56 -0800

Yeah, map-index is complex but that's due to problems with cursors onsecondary btrees in Elephant. I'd like to rewrite it but didn't finda good strategy before. See my notes below.

I have one idea I'll look into for the standard case of walking asimple range.

map-index is generic, you are welcome to override it with a postmodernspecific version rather than stressing about making the cursor APIswork such that they make map-index work.



On Nov 29, 2007, at 1:40 PM, Alex Mizrahi wrote:

hello

i'm now trying to improve implementation of cursor in postmodernbackend.

it would be nice to optimize it for usage patterns of map-index, but
map-index implementation frightens me: is there a reason for such
complexity?

for example, just to enumarate index with two values it does callslike

this:

(cursor-pfirst c) => 1
(cursor-pnext-dup c) => nil
(cursor-pset c 1)
(cursor-pnext-nodup c) => 2
(cursor-pnext-dup c) => nil
(cursor-pset c 2)
(cursor-pnext-nodup c) => nil

while in my opinion it would be enough to do this:

(cursor-pfirst c) => 1
(cursor-pnext c) => 2
(cursor-pnext c) => nil

What if there are more than two values = 2? This will only get thefirst one. In this case, you need to get all duplicates for each value.

The pset is used because pnext-dup = nil causes the cursor to have aninvalid location (Elephant design problem) so we use pset to get backinto the current duplicate set so we can do a pnext-no-dup on a validcursor.

If your pnext-dup can return nil and still point at the last elementof the duplicate set, then you could remove that call in your versionof map-index.

even if cursor is aware of such behaviour and caches cursor-pset,it's still
a waste of CPU cycles to do 3 times more calls and demand cursor to do
dup/nodup checks.

Althoughthese calls are probably not a big deal because the pages arecached in memory and page fetch time swamps a few function calls soit's in the noise for overall performance. For really large DBs theremay be issues keeping all the higher pages in memory to support pset(since pnext only fetches adjacent pages). I'll think about this abit...

I think this is basically what happens since you don't need to do apset when pnext-dup returns nil.

next, if we have query by value, i think optimal sequence would be:

(cursor-pset c value) => 1
(cursor-pnext-dup c) => 2
(cursor-pnext-dup c) => 3
(cursor-pnext-dup c) => nil
...
that is, if cursor has enough logic to enumerate duplicates (allvalues for
given key), why not use it?
however when cursor-pnext-dup returns NIL, that means there's nomore valueswith such key, elephant does once more (cursor-pset c value), andtries(cursor-pnext-nodup c), and then uses lisp-compare<= on returnedvalue.
this was actually a reason for error Henrik have reported.
db-postmodern's ordering of keys differs from lisp-compare ordering(and insome cases it's just random). normally that should break only rangequeries.but with aforementioned (overly-general?) map-index logic it breakseven
simple get-instances-by-value calls, which is not good.

so, am i missing something, or really it would be better to refactor
map-index?
if it would be calling only what's needed, but won't do cursor-pset"forgenerality" i can skip cursor-pset optimization -- i don't thinkanybody inreal code working with cursors will repeatadly call cursor-psetwithout a
reason.

with best regards, Alex 'killer_storm' Mizrahi.



_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Re: [elephant-devel] map-index

Reply via email to