First of all, BDB does not sort on the serialized values except for
values for which no lisp ordering exists. BDB is given a C function
which decodes the serialized format on the fly, without talking to
Lisp, but properly orders all of lisp's orderable objects.
Orderable lisp types:
1. Numeric types: all numeric types are sorted identically to lisp (=
< > <= >=, etc)
Numeric types include fixnum, integer, float, double-float, rational
2. String types: all string types are sorted identically to lisp
(string< string<= string=, etc)
Unorderable lisp types:
1. Symbols: I do sort symbols by symbol-name as a convenience.
Lisp does not provide an ordering fn for symbols so this is additive
2. Objects, structs, etc: lisp does not provide an ordering function
for objects
3. Arrays, hash tables, etc: lisp does not provide an ordering
function for these types
Some of these types can be grouped by eq, eql, equal or equalp. More
on this below.
Because a BTree requires some order for all objects, we need to
expand on the lisp specification of ordering. First we sort on type
so that all unorderable objects of a given type are stored as a
group. Then we choose an arbitrary order (binary rep). Now most of
the hairy issues arise when we mix types in an index and want to
traverse a range in the index or do cursor ops on an index. The
specification is this:
1) Elephant makes no guarantee about the order of unorderable types.
Portable implementations should not depend on an order being
consistent. Unorderable types should be treated as sets. Anything
that is equalp in lisp will be grouped together in the data store
(arrays, structs, symbols)
2) Elephant makes no guarantee about the relative order of different
types in an index.
*** There is an open issue: how to specify a range by object type.
You can't do :start and :end because you don't know what object will
be at the start and end due to #1 and #2 above. I think this should
be dealt with as a special case and a less efficient procedure use
I can't agree with either of your proposals due to performance issues:
1) The performance impact of (position, value) is extreme. I turn
all of the BTree O(log N) operations into O(N). I now have to read
all values out of the BTree until I identify the insertion point,
then renumber all the following elements. This defeats the whole
purpose of having a BTree in the first place.
2) I hate to say it but the CL-SQL approach to cursor operations is
not just terribly inefficient, it also defeats the purpose of having
indicies on disk by requiring that all values be pulled into memory
to support cursor or map operations. I see the CL-SQL backend as a
solution to licensing problems for small databases. For complex
queries over large databases one is better off with an object
relational solution like that implemented by CL-SQL's native metaclass.
The reason that lisp needs to implement a more complex sorting
function to support cursor operations over BTrees is that BTrees
benefit from ordering types relative to each other (we can't
intersperse numeric and non-numeric types without violating lisp
ordering) and the cursor fn's need to mirror this. This doesn't
violate lisp ordering, it is additive. The clarity of the required
constraints could be improved, though.
> There is a hidden danger in relying upon an order based on the
serialized value---namely,
> you can now not swap out serializers without drastic side
effects. Since one of the main
> ways that we can improve performance is by writing better
serializers (and, in particular,
> serializers that are specific to particular data types), this
seems like a bad idea.
If designers adhere to the two restrictions above, then a new
serializer that supports lisp ordering and introduces a new ordering
of types or within types that are unordered in lisp will have no
impact on application code.
Sorry if any point is unclear, I'm in a hurry. Please ask clarifying
questions if necessary.
Ian
PS - This leaves me with a question. Is it possible in any
relational DB to register a sorting fn that you can sort by when
doing a sorted query? Typically rows are sorted by their unique ID
field (i.e. the key used in the underlying BTree).
On Aug 2, 2007, at 5:55 PM, Robert L. Read wrote:
Yes, I think I understand this. However, a costly alternative does
exist: just never let
BDB use its own order. Always impose one that we can compute in
lisp. Then in
BDB you store a (position,value) pair instead of a value, and
either ensure that BDB
sorts on the first part of the binary representation of the
position the way you want it to,
or you add a lot of logic into the "cursor-next" operation. This
is how it is done on the
CLSQL side.
It is almost certainly a bit slower, and it is certainly a bit
harder to code.
It seems to me that the root of the problem is that BDB does indeed
order based on a
serialized value. That is what we should remove. Certainly, if
someone were to write
a Pure-LISP backend, which I hope will occur eventually, it would
seem silly for them to
have to respect an artifact inherited from BDB, when part of the
purpose of such a project
is to escape dependence on BDB.
Forgive me if I'm confused but I assert that we should reverse your
argument: we should
force BDB to be isomorphic to a lisp sorter, not build a lisp
sorted isomorphic to BDB.
On Tue, 2007-07-31 at 15:24 -0400, Ian Eslick wrote:
The practical problem that led to the current design of index
sorting is that we cannot use lisp code to define the sorting
function for serialized values inside BDB Btrees (same problem I
imagine that Henrik had with postmodern). Instead, there is a
hairy custom C procedure that is registered with BDB that parses
the serialized format so that sorting is done first by type
(symbol, string, object, pobject, etc) followed by ordering within
numeric types, strings and symbols. Everything else is ordered
based on the byte ordering of its serialized representation. To
map across indices correctly, we need to know up front whether the
start value is less than the end value. And so we need a standard
lisp function that is isomorphic to the BDB sorting function.
Ideally postmodern would have a similar sorting function that
properly interprets the serialized format just like the BDB
function does. I think it's best to have a single standard
ordering that is as close to lisp's notion of ordering as possible
so we don't have to maintain different orderings. Ian PS - It
might be possible to have a lisp ordering function implement BDB's
notion of sorting by registering it as a callback, however it
would have to deserialize the BDB values each time. So the
problems with this are both stability concerns for foreign
callbacks and the performance impact of serialization/
deserialization for internal BDB operations. On the cleanliness/
performance axis, I think the current approach is the right
tradeoff (it's the original one Ben made, FYI). On Jul 31, 2007,
at 12:50 PM, Robert L. Read wrote: > Personally, I think the only
sensible way to handle this problem is > to require the user to >
specify an ordering function. We can of course provide a default,
> which will be error-prone > but tend to work most of the time. >
> The function called "my-generic-less-than" which is in the
source > tree now could be > a starting point for a generic
ordering. > > > On Tue, 2007-07-24 at 09:48 -0400, Ian Eslick
wrote: >> Robert and I have had some extended discussions on
ordering in >> indices. I think that all we really need to agree
on is _some_ >> canonical ordering. If we have mixed types in an
index, how should >> they be ordered relative to each other? In
BDB we have a C >> function which implements the ordering based on
the type tag and >> then based on the type within it. Are you
relying on a pure binary >> sort in postmodern? Robert or I will
get to submitting that patch >> shortly. I have recently sent in a
patch to lisp-compare<= so >> we'll see if we had to make parallel
changes. Thanks, Ian On Jul >> 24, 2007, at 3:50 AM, Henrik Hjelte
wrote: > I sent this message >> yesterday but I guess it got stuck
in the mailing > list filter. >> Perhaps the attachment was too
big. Since my > common-lisp.net >> user hhjelte does not have
write access to elephant I > have >> placed the patches from here
instead: > darcs get http://common- >> lisp.net/project/grand-prix/
darcs/elephant > > ---------- >> Forwarded message ---------- >
From: Henrik Hjelte >> <[EMAIL PROTECTED]> > Date: Jul 23, 2007
11:28 PM > Subject: >> some patches > To: [EMAIL PROTECTED]
lisp.net > > > Here are >> some darcs patches that might be of
interest. I had some > >> problems with map-index on db-postmodern
that made me almost rip >> my > hair of, but finally I made it to
work again. The problem is >> that > map-index for a string value
rely on the ordering in the >> btree > (continue-p makes use of
less than for strings). The >> postmodern > backend relies on how
the database backend orders >> things, which is not > always the
same thing. Is it a necessary >> feature that b-trees of > string
and objects are required to be >> ordered by lisp-compare<=? > >
In the process of solving the bug I >> have upgraded the test
framework > to use FiveAM instead of RT, It >> has in my opinion a
very nice syntax > and some useful features to >> track
dependencies between tests. I hope > you agree that it >> improves
on things. > > /Henrik Hjelte > >>
_______________________________________________ > elephant-devel
>> site list > elephant-devel@common-lisp.net > http://common- >>
lisp.net/mailman/listinfo/elephant-devel >>
_______________________________________________ elephant-devel >>
site list elephant-devel@common-lisp.net http://common-lisp.net/
>> mailman/listinfo/elephant-devel >
_______________________________________________ > elephant-devel
site list > elephant-devel@common-lisp.net > http://common-
lisp.net/mailman/listinfo/elephant-devel
_______________________________________________ elephant-devel
site list elephant-devel@common-lisp.net http://common-lisp.net/
mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel