Well here are the open decisions:

- Characters as an independent type or a numeric type?
- Case sensitive vs. insensitive string/symbol/pathname sorting?
  (insensitive was the policy implemented by the original developers)
- Settle on a standard type ordering and find some way to do a range query over a type, or develop an alternative data structure model

Actually the btree vs. dictionary argument isn't quite right. The unsortability of certain ranges of the current btree abstraction is a design tradeoff stemming from a conflict between dynamic typing and the the lisp definition of sortable. Not all serializable lisp objects are sortable, so we have two classes of objects.

[BTrees are an implemenation, not an abstraction, so perhaps should be an optional, low-level interface only experts use? The arbitrary sort order used in the BTree 'implementation' is still deterministic and thus enables log(N) insert/delete and linear time traversal.]

There are several ways we can deal with the two-class problem:

1) BTrees can only store a single, statically declared type (perhaps we allow all 'number' subclasses as a type)

2) We allow BTrees to store anything, as outlined in the original writeup below.

3) We allow BTrees to store anything, but we bend over backwards to document how each unsortable type is actually sorted and further wack away at all the data stores so they support the defined sort order for unsortable types. This strikes me as a huge headache (Robert, remember the headache in duplicate sorting differences between BDB and SQL? Multiply that by N data stores...:)

4) BTrees can only store sortable objects and we define the ordering of different type ranges. We add a dictionary type to store both sortable and unsortable objects. There are some implications, though.

This means that indexed slot values must be sortable and you get a type violation error if you try to write an unsorted value. This is a problem if you add an index to an existing class with slots that have values that are not sortable...

Now lisp has no notion of ordering between types and it is hard to index into the first or last member of any given type (given the current abstractions in all data store implementations). Do we disallow numbers and strings to reside in the same tree? If not, is 42 < "deep thought" or is "deep thought" < 42? :) The logical outcome of a cleaner BTree abstraction is that BTrees and slot indices become statically typed (just like CLSQL classes) and the dictionary type is more like the current BTree, but doesn't have cursors, succ or pred. I think lisp users can deal with not relying on the enumeration ordering for a dictionary since they deal with hash-table enumeration just fine today.

In my mind only options 2 and 4 are viable. The difference is whether you want the flexibility of writing anything to btrees and indexed slots verses cleaner data structure abstractions and range queries.

We could also keep the current btree model and clean up the abstractions by building the structures in #2 on top for users desiring less power and more predictability. This would mean the current BTree with a complex definition of order, a 'tree' data structure with pred/succ that only takes a single type and a 'dictionary' data structure without it. This last option is pretty easy to build on top of btrees, actually.

Finally, class indexing could be extended to enforce types so you could choose a BTree index or a cleaner 'tree' index (what is the canonical name for a totally ordered set with pred/succ on it?)

(defpclass my-class ()
  ((slot1 :accessor slot1 :initarg :slot1 :index t :index-type number)
   (slot2 :accessor slot2 :initarg :slot2 :index t)))

(make-instance 'my-class :slot1 1 :slot2 'foo)
(make-instance 'my-class :slot1 2.0 :slot2 "test")
(make-instance 'my-class :slot1 (the bignum 2000000000000000) :slot2 'foo)

(get-instances-by-value 'my-class 'slot2 "test")
=> (#<MY-CLASS s1: 2.0 s2: "test">)
(get-instances-by-range 'my-class 'slot1 0 10)
=> (#<MY-CLASS s1: 1 s2: 'foo> #<MY-CLASS s1: 2.0 s2: "test">)
(get-instances-by-value 'my-class 'slot2 'foo)
=> (#<MY-CLASS s1: 1 s2: 'foo> #<MY-CLASS s1: 200000000000000 s2: 'foo>)

(make-instance 'my-class :slot1 'foo :slot2 'foo) => (error "Invalid slot-type for typed index slot")

If we keep the current BTree abstraction in any form, we should definitely provide a way to map over a type range within the btree as opposed to only the entire tree. At least for now I can stub in a linear version of this vs. the ideal log(N) + M.

Whew...

Ian



On Oct 22, 2007, at 3:55 PM, Robert L. Read wrote:

I think this is the best policy that we can generate.

I personally don't like it, as I've mentioned before.  I don't think
we should have an arbitrary order. If you don't have a specified order, then you don't really have a btree in meaningful sense. You can't do a
range query over something that is arbitrary, and if you don't have a
range query, why call it a btree? You should just call it a dictionary,
(which happens to be implemented with btrees) equip it with an
enumeration, and have something just as efficient and simpler than a
btree, which furthermore decreases the chances of someone accidentally
depending on the order, since you never provide a way to utilize it
(although of course humans, being human, will accidentally depend on the
ordering of objects returned by the enumeration, so the problem never
goes away from a software engineering point of view.)

However, I don't have time to reorganize things to present a
"dictionary" in addition to a "btree" abstraction; and even if someone
volunteered to do this work, it would be worthy of a new release, rather
than a discussion of the current release.  And since Ian has done much
more work than I have in the last year, I will defer to his judgment.

Is there a reason that string sorting is case insensitive?  This seems
like asking for trouble --- that's something that people are likely to
depend on and be surprised when they switch repositories and find it is
different.


On Mon, 2007-10-22 at 13:18 -0400, Ian Eslick wrote:
There have been a number of discussions about sorting issues and
guarantees among the various data store implementations.  This is
difficult due to different backend serialized representations.  This
is my understanding of the current guarantees.  Let's commit to this
as part of locking down 0.9.1.

Any ordered data structure, such as btree or indexed-btrees, has to
make some guarantees about the function that defines the ordering.
Currently, the BDB data store and Elephant core implements the
following policy (I believe this is also true for CL-SQL stores as
well):

General policies:

In a BTree with multiple types, each type is sorted as an independent
subsequence.  Types are guaranteed to be separated, but the order of
the type sub-sequences is not guaranteed.

In an Index with multiple values, there is no guarantee about the
ordering of duplicate values, the the index is treated as a sorted
sequence of unordered equivalence classes.  Tests and users should
not depend on this.

Specific types:

Numeric types (fixnum, bignum, float, double, rational and complex):
Sorted properly over the continuous real numbers, smallest to biggest

Characters:
Currently considered and sorted in with all the above numeric types
(this is probably wrong)

Strings:
Sorted alphabetically, case-INSENSITIVE

Symbols/Pathnames:
Sorted as strings above, but symbols and paths are sorted
independently as type classes.

Persistent objects are sorted according to OID, but the OID ordering
may change at any time, so long as object equivalency is maintained
(for example, if we implement GC).  Users should not depend on the
OID for anything other than uniqueness unless they are implemented a
new low-level persistent-collection data structure like (N:1 and N:N)
associations that is GC aware.  Otherwise the order can only be
guaranteed within a given transaction.

All other types have no user-visible ordering guarantees.  They have
an arbitrary ordering within the data store, but there is no lisp
ordering that makes sense for hash tables, standard objects, etc.
(You can create derived indices of an indexed-btree that compute a
number, string, etc that can serve as an ordering for arbitrary
objects).

Open issues:
- Characters as independent class, or as numeric codes?
- If/how to map over a type subset in an index or btree


Anything I missed?

Ian


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Reply via email to