Yeah, map-index is complex but that's due to problems with cursors on
secondary btrees in Elephant. I'd like to rewrite it but didn't find
a good strategy before. See my notes below.
I have one idea I'll look into for the standard case of walking a
simple range.
map-index is generic, you are welcome to override it with a postmodern
specific version rather than stressing about making the cursor APIs
work such that they make map-index work.
On Nov 29, 2007, at 1:40 PM, Alex Mizrahi wrote:
hello
i'm now trying to improve implementation of cursor in postmodern
backend.
it would be nice to optimize it for usage patterns of map-index, but
map-index implementation frightens me: is there a reason for such
complexity?
for example, just to enumarate index with two values it does calls
like
this:
(cursor-pfirst c) => 1
(cursor-pnext-dup c) => nil
(cursor-pset c 1)
(cursor-pnext-nodup c) => 2
(cursor-pnext-dup c) => nil
(cursor-pset c 2)
(cursor-pnext-nodup c) => nil
while in my opinion it would be enough to do this:
(cursor-pfirst c) => 1
(cursor-pnext c) => 2
(cursor-pnext c) => nil
What if there are more than two values = 2? This will only get the
first one. In this case, you need to get all duplicates for each value.
The pset is used because pnext-dup = nil causes the cursor to have an
invalid location (Elephant design problem) so we use pset to get back
into the current duplicate set so we can do a pnext-no-dup on a valid
cursor.
If your pnext-dup can return nil and still point at the last element
of the duplicate set, then you could remove that call in your version
of map-index.
even if cursor is aware of such behaviour and caches cursor-pset,
it's still
a waste of CPU cycles to do 3 times more calls and demand cursor to do
dup/nodup checks.
Althoughthese calls are probably not a big deal because the pages are
cached in memory and page fetch time swamps a few function calls so
it's in the noise for overall performance. For really large DBs there
may be issues keeping all the higher pages in memory to support pset
(since pnext only fetches adjacent pages). I'll think about this a
bit...
I think this is basically what happens since you don't need to do a
pset when pnext-dup returns nil.
next, if we have query by value, i think optimal sequence would be:
(cursor-pset c value) => 1
(cursor-pnext-dup c) => 2
(cursor-pnext-dup c) => 3
(cursor-pnext-dup c) => nil
...
that is, if cursor has enough logic to enumerate duplicates (all
values for
given key), why not use it?
however when cursor-pnext-dup returns NIL, that means there's no
more values
with such key, elephant does once more (cursor-pset c value), and
tries
(cursor-pnext-nodup c), and then uses lisp-compare<= on returned
value.
this was actually a reason for error Henrik have reported.
db-postmodern's ordering of keys differs from lisp-compare ordering
(and in
some cases it's just random). normally that should break only range
queries.
but with aforementioned (overly-general?) map-index logic it breaks
even
simple get-instances-by-value calls, which is not good.
so, am i missing something, or really it would be better to refactor
map-index?
if it would be calling only what's needed, but won't do cursor-pset
"for
generality" i can skip cursor-pset optimization -- i don't think
anybody in
real code working with cursors will repeatadly call cursor-pset
without a
reason.
with best regards, Alex 'killer_storm' Mizrahi.
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel