Jens Alfke created COUCHDB-2327:
-----------------------------------
Summary: Add string/array prefix match option, for view queries
Key: COUCHDB-2327
URL: https://issues.apache.org/jira/browse/COUCHDB-2327
Project: CouchDB
Issue Type: Improvement
Security Level: public (Regular issues)
Components: HTTP Interface
Reporter: Jens Alfke
View querying provides no clean way to match a string prefix The only advice
I've seen is to set startkey to the prefix, and endkey to the prefix with "some
really high Unicode character" appended, which is a total kludge*.
There's a similar issue with matching an array prefix, e.g. "all keys that
start with [2014, ...]". Here the solution is less kludgy (append a "{}" to the
endkey) but it's still very unintuitive to people learning CouchDB. I've had to
explain it to newbies many times.
I suggest adding an explicit query option to enable prefix matching. This
doesn't need to mess with the actual query engine — all it has to do is modify
the endkey by appending an appropriate Unicode character (in the string case)
or empty object (in the array case.) If no `endkey` is given it will be based
on the `startkey`.
I've already implemented a comparable feature for Couchbase Lite:
https://github.com/couchbase/couchbase-lite-ios/wiki/Query-Enhancements#prefix-matching
Note that I made the `prefix_match` parameter an integer, not a boolean. This
is to support cases where you want to match a prefix of a _nested component_ of
the key, for example "all keys in 2014 whose product name starts with 'f'",
where the startkey would be [2014, "f"] and the prefix_match would be 2 to
indicate that it's the nested string that should be prefix-matched not the
array. But in the common case you'd just set the value to 1 to indicate that
the top level key should be prefix-matched.
* Why is adding "some high Unicode character" a kludge? Because Unicode is so
complicated and so inconsistently implemented. Doing this immediately opens the
possibility of weird Unicode issues in your development language's string type,
in its HTTP client library, and in Erlang's equivalents on the server side. Not
to mention the swamp that is the Unicode specification itself — for instance,
I've seen advice to use a character like \uFFFE, which was correct until
Unicode went 32-bit, and tended to work alright for a while after that, but
will now fail with emoji characters (which are both very commonly used and well
outside the 16-bit range.) Actually whether it fails depends on whether your
string implementation operates on UTF-16 (very common) or true Unicode code
points. Like I said, it's a kludge.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)