User collation settings (case, accent, sensitive, locale, etc) should
be an option for views if anyone wants to take that on.
On Apr 9, 2009, at 12:49 PM, Paul Davis wrote:
Oddly enough, this is expected behavior:
values.push("a");
values.push("A");
values.push("aa");
values.push("b");
values.push("B");
values.push("ba");
values.push("bb");
Even fiddling with the ICU collation options I couldn't get it to sort
any differently.
Did you recreate the indexes from scratch? Otherwise they'll still be
sorted with the old collation.
-Damien
I'm not sure if there's an explanation that I'm
missing or what but it sure seems like "aa" should come before "A" for
case sensitive sorting. Unless of course its doing something dumb like
sorting right to left in which case "a" > null.
No idea.
Paul Davis
Index: src/couchdb/couch_erl_driver.c
===================================================================
--- src/couchdb/couch_erl_driver.c (revision 762581)
+++ src/couchdb/couch_erl_driver.c (working copy)
@@ -22,6 +22,8 @@
#define U_DISABLE_RENAMING 1
#endif
+#include <stdio.h>
+
#include "erl_driver.h"
#include "unicode/ucol.h"
#include "unicode/ucasemap.h"
@@ -63,13 +65,25 @@
return ERL_DRV_ERROR_GENERAL;
}
+ ucol_setAttribute(pData->coll, UCOL_CASE_FIRST,
UCOL_LOWER_FIRST, &status);
+ if(U_FAILURE(status)) {
+ couch_drv_stop((ErlDrvData)pData);
+ return ERL_DRV_ERROR_GENERAL;
+ }
+
+ ucol_setAttribute(pData->coll, UCOL_CASE_LEVEL, UCOL_ON,
&status);
+ if(U_FAILURE(status)) {
+ couch_drv_stop((ErlDrvData)pData);
+ return ERL_DRV_ERROR_GENERAL;
+ }
+
pData->collNoCase = ucol_open("", &status);
if (U_FAILURE(status)) {
couch_drv_stop((ErlDrvData)pData);
return ERL_DRV_ERROR_GENERAL;
}
On Thu, Apr 9, 2009 at 6:53 AM, Brian Candler <[email protected]>
wrote:
I was very surprised to find that view keys seem to be case-
insensitive when
using startkey and endkey:
$ curl -X POST -d '{"map":"function(doc) { emit(doc.foo,
null); }"}' 'http://127.0.0.1:5984/test_suite_db/_temp_view?
startkey="a"&endkey="az"'
{"total_rows":26,"offset":7,"rows":[
{"id":"7","key":"a","value":null},
{"id":"8","key":"A","value":null}, <<<< huh?!
{"id":"9","key":"aa","value":null}
]}
But not when fetching them individually:
$ curl -X POST -d '{"map":"function(doc) { emit(doc.foo,
null); }"}' 'http://127.0.0.1:5984/test_suite_db/_temp_view?key="a"'
{"total_rows":26,"offset":7,"rows":[
{"id":"7","key":"a","value":null}
]}
$ curl -X POST -d '{"map":"function(doc) { emit(doc.foo,
null); }"}' 'http://127.0.0.1:5984/test_suite_db/_temp_view?key="A"'
{"total_rows":26,"offset":8,"rows":[
{"id":"8","key":"A","value":null}
]}
(Ditto for startkey="a"&endkey="a", or startkey="A"&endkey="A")
At http://wiki.apache.org/couchdb/View_collation it says that view
keys are
case-sensitive, which normally means that "A" does not appear in
the range
"a" to "aa". And with normal ASCII ordering I would expect "A" to
sort
before "a", as is the case with Javascript:
js> "a" < "A"
false
Could someone please explain to me what's going on? This may also
explain my
recent report COUCHDB-324 where tilde does not collate where I'd
expect.
I am running a recent SVN build:
{"couchdb":"Welcome","version":"0.9.0a762247"}
Thanks,
Brian.