First of all, BDB does not sort on the serialized values except for values for which no lisp ordering exists. BDB is given a C function which decodes the serialized format on the fly, without talking to Lisp, but properly orders all of lisp's orderable objects.

Orderable lisp types:
1. Numeric types: all numeric types are sorted identically to lisp (= < > <= >=, etc)
   Numeric types include fixnum, integer, float, double-float, rational
2. String types: all string types are sorted identically to lisp (string< string<= string=, etc)

Unorderable lisp types:
1. Symbols: I do sort symbols by symbol-name as a convenience.
   Lisp does not provide an ordering fn for symbols so this is additive
2. Objects, structs, etc: lisp does not provide an ordering function for objects 3. Arrays, hash tables, etc: lisp does not provide an ordering function for these types

Some of these types can be grouped by eq, eql, equal or equalp. More on this below.

Because a BTree requires some order for all objects, we need to expand on the lisp specification of ordering. First we sort on type so that all unorderable objects of a given type are stored as a group. Then we choose an arbitrary order (binary rep). Now most of the hairy issues arise when we mix types in an index and want to traverse a range in the index or do cursor ops on an index. The specification is this:

1) Elephant makes no guarantee about the order of unorderable types. Portable implementations should not depend on an order being consistent. Unorderable types should be treated as sets. Anything that is equalp in lisp will be grouped together in the data store (arrays, structs, symbols)

2) Elephant makes no guarantee about the relative order of different types in an index.

*** There is an open issue: how to specify a range by object type. You can't do :start and :end because you don't know what object will be at the start and end due to #1 and #2 above. I think this should be dealt with as a special case and a less efficient procedure use

I can't agree with either of your proposals due to performance issues:

1) The performance impact of (position, value) is extreme. I turn all of the BTree O(log N) operations into O(N). I now have to read all values out of the BTree until I identify the insertion point, then renumber all the following elements. This defeats the whole purpose of having a BTree in the first place.

2) I hate to say it but the CL-SQL approach to cursor operations is not just terribly inefficient, it also defeats the purpose of having indicies on disk by requiring that all values be pulled into memory to support cursor or map operations. I see the CL-SQL backend as a solution to licensing problems for small databases. For complex queries over large databases one is better off with an object relational solution like that implemented by CL-SQL's native metaclass.

The reason that lisp needs to implement a more complex sorting function to support cursor operations over BTrees is that BTrees benefit from ordering types relative to each other (we can't intersperse numeric and non-numeric types without violating lisp ordering) and the cursor fn's need to mirror this. This doesn't violate lisp ordering, it is additive. The clarity of the required constraints could be improved, though.

> There is a hidden danger in relying upon an order based on the serialized value---namely, > you can now not swap out serializers without drastic side effects. Since one of the main > ways that we can improve performance is by writing better serializers (and, in particular, > serializers that are specific to particular data types), this seems like a bad idea.

If designers adhere to the two restrictions above, then a new serializer that supports lisp ordering and introduces a new ordering of types or within types that are unordered in lisp will have no impact on application code.

Sorry if any point is unclear, I'm in a hurry. Please ask clarifying questions if necessary.

Ian

PS - This leaves me with a question. Is it possible in any relational DB to register a sorting fn that you can sort by when doing a sorted query? Typically rows are sorted by their unique ID field (i.e. the key used in the underlying BTree).



On Aug 2, 2007, at 5:55 PM, Robert L. Read wrote:

Yes, I think I understand this. However, a costly alternative does exist: just never let BDB use its own order. Always impose one that we can compute in lisp. Then in BDB you store a (position,value) pair instead of a value, and either ensure that BDB sorts on the first part of the binary representation of the position the way you want it to, or you add a lot of logic into the "cursor-next" operation. This is how it is done on the
CLSQL side.

It is almost certainly a bit slower, and it is certainly a bit harder to code.

It seems to me that the root of the problem is that BDB does indeed order based on a serialized value. That is what we should remove. Certainly, if someone were to write a Pure-LISP backend, which I hope will occur eventually, it would seem silly for them to have to respect an artifact inherited from BDB, when part of the purpose of such a project
is to escape dependence on BDB.

Forgive me if I'm confused but I assert that we should reverse your argument: we should force BDB to be isomorphic to a lisp sorter, not build a lisp sorted isomorphic to BDB.


On Tue, 2007-07-31 at 15:24 -0400, Ian Eslick wrote:
The practical problem that led to the current design of index sorting is that we cannot use lisp code to define the sorting function for serialized values inside BDB Btrees (same problem I imagine that Henrik had with postmodern). Instead, there is a hairy custom C procedure that is registered with BDB that parses the serialized format so that sorting is done first by type (symbol, string, object, pobject, etc) followed by ordering within numeric types, strings and symbols. Everything else is ordered based on the byte ordering of its serialized representation. To map across indices correctly, we need to know up front whether the start value is less than the end value. And so we need a standard lisp function that is isomorphic to the BDB sorting function. Ideally postmodern would have a similar sorting function that properly interprets the serialized format just like the BDB function does. I think it's best to have a single standard ordering that is as close to lisp's notion of ordering as possible so we don't have to maintain different orderings. Ian PS - It might be possible to have a lisp ordering function implement BDB's notion of sorting by registering it as a callback, however it would have to deserialize the BDB values each time. So the problems with this are both stability concerns for foreign callbacks and the performance impact of serialization/ deserialization for internal BDB operations. On the cleanliness/ performance axis, I think the current approach is the right tradeoff (it's the original one Ben made, FYI). On Jul 31, 2007, at 12:50 PM, Robert L. Read wrote: > Personally, I think the only sensible way to handle this problem is > to require the user to > specify an ordering function. We can of course provide a default, > which will be error-prone > but tend to work most of the time. > > The function called "my-generic-less-than" which is in the source > tree now could be > a starting point for a generic ordering. > > > On Tue, 2007-07-24 at 09:48 -0400, Ian Eslick wrote: >> Robert and I have had some extended discussions on ordering in >> indices. I think that all we really need to agree on is _some_ >> canonical ordering. If we have mixed types in an index, how should >> they be ordered relative to each other? In BDB we have a C >> function which implements the ordering based on the type tag and >> then based on the type within it. Are you relying on a pure binary >> sort in postmodern? Robert or I will get to submitting that patch >> shortly. I have recently sent in a patch to lisp-compare<= so >> we'll see if we had to make parallel changes. Thanks, Ian On Jul >> 24, 2007, at 3:50 AM, Henrik Hjelte wrote: > I sent this message >> yesterday but I guess it got stuck in the mailing > list filter. >> Perhaps the attachment was too big. Since my > common-lisp.net >> user hhjelte does not have write access to elephant I > have >> placed the patches from here instead: > darcs get http://common- >> lisp.net/project/grand-prix/ darcs/elephant > > ---------- >> Forwarded message ---------- > From: Henrik Hjelte >> <[EMAIL PROTECTED]> > Date: Jul 23, 2007 11:28 PM > Subject: >> some patches > To: [EMAIL PROTECTED] lisp.net > > > Here are >> some darcs patches that might be of interest. I had some > >> problems with map-index on db-postmodern that made me almost rip >> my > hair of, but finally I made it to work again. The problem is >> that > map-index for a string value rely on the ordering in the >> btree > (continue-p makes use of less than for strings). The >> postmodern > backend relies on how the database backend orders >> things, which is not > always the same thing. Is it a necessary >> feature that b-trees of > string and objects are required to be >> ordered by lisp-compare<=? > > In the process of solving the bug I >> have upgraded the test framework > to use FiveAM instead of RT, It >> has in my opinion a very nice syntax > and some useful features to >> track dependencies between tests. I hope > you agree that it >> improves on things. > > /Henrik Hjelte > >> _______________________________________________ > elephant-devel >> site list > elephant-devel@common-lisp.net > http://common- >> lisp.net/mailman/listinfo/elephant-devel >> _______________________________________________ elephant-devel >> site list elephant-devel@common-lisp.net http://common-lisp.net/ >> mailman/listinfo/elephant-devel > _______________________________________________ > elephant-devel site list > elephant-devel@common-lisp.net > http://common- lisp.net/mailman/listinfo/elephant-devel _______________________________________________ elephant-devel site list elephant-devel@common-lisp.net http://common-lisp.net/ mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Reply via email to