On 29/12/2008, at 3:26 PM, Chris Anderson wrote:

I almost suggesting giving an option for inclusive and exclusive
interval ends, basically, < / > vs <= / >= control from the client.
But then thinking about Maximillian's proposal (of defaulting to an
exclusive right end) I began to wonder if offering *only* the
interval-style he suggests, would satisfy both precision maths, and
newbie expectations.

My concern right now is prefix searching e.g. paging though

startkey='rs' endkey='rs\uFFF8'

It would be good to have a prefix-test mode that would be applicable to the 'final' string component of a key - ala SQLs "LIKE 'rs%'". This would eliminate the need for the 'rs\uFFF8' hack.

Something like endkey_succ=<key> which would be equivalent to a non- inclusive endkey=succ(<key>) where succ(x) is the first key value wrt the the view collation algorithm that wouldn't satisfy x <= <key>. The essential characteristic being that succ(x) doesn't need to be calculated by the client.

I'm not suggesting endkey_succ as the syntactic mechanism.

In my opinion the ICU collation
driver is configured sanely, and I feel comfortable delegating to ICU.
It's a good library for our cause. I would absolutely love to see test
cases that indicated where CouchDB can improve on this front.

I'd like to be able to turn on normalization for all sorting. I could normalise all documents, and all key values, but given that CouchDB has IUC, this would be a lot more convenient and reliable if it was a server-provided feature.

I imagine some might like to enable correct ordering of French accents: http://unicode.org/reports/tr10/#French_Accents, which is a specific instance of a linguistic tailoring as described here: http://unicode.org/reports/tr10/#Linguistic_Features . I suggest that both a couch instance, and/or an individual db might want to specify a unicode locale from e.g. http://unicode.org/cldr/

There's been a suggestion of raw Unicode code point ordering as a
collation configuration parameter, specifiable in design docs.

That's not valid unicode. I think it's a bad idea.

Maybe
the next logical step is a configuration member, for design docs,
which could optionally specify the ICU configuration.

Specified in a hierarchic manner: system / db. I hesitate to include 'view' because there are a number of view-like things that don't have configuration (_all_docs), and for completeness you would then want to deal with propagating a particular configuration through all of the design-doc-driven facilities. IMO, just the system & db would be enough.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A Man may make a Remark –
In itself – a quiet thing
That may furnish the Fuse unto a Spark
In dormant nature – lain –

Let us divide – with skill –
Let us discourse – with care –
Powder exists in Charcoal –
Before it exists in Fire –

  -– Emily Dickinson 913 (1865)


Reply via email to