On 13/12/2008, at 7:40 AM, Damien Katz wrote:
The decision to limit names db and design doc names is a pragmatic
one, it simplifies things greatly. CouchDB is full of things that
could be better. Patches welcome.
OK, this code now works for me in a client:
require 'rubygems'
require 'json'
require 'couchrest'
require 'cgi'
db_name = CGI.escape("Ser vices/new - ∞शछغش갥걸ペボ")
db = CouchRest.database!("http://localhost:5984/" + db_name)
db_name = CGI.escape("ser vices/new - ∞शछغش갥걸ペボ")
db = CouchRest.database!("http://localhost:5984/" + db_name)
And this URL in Safari:
http://127.0.0.1:5984/Ser+vices%2Fnew+-+∞शछغش갥걸ペボ
returns this:
{"db_name":"Ser vices/new - \u221e\u0936\u091b\u063a
\u0634\uac25\uac78\u30da\u30dc","doc_count":1,"doc_del_count":
0,"update_seq":1,"purge_seq":0,"compact_running":false,"disk_size":
14365}
The filesystem looks like this:
Ser+vices%2Fnew+-+%E2%88%9E%E0%A4%B6%E0%A4%9B%D8%BA%D8%B4%EA
%B0%A5%EA%B1%B8%E3%83%9A%E3%83%9C-lhxj+E81IP9xm+0ssUSsQ==.couch
ser+vices%2Fnew+-+%E2%88%9E%E0%A4%B6%E0%A4%9B%D8%BA%D8%B4%EA
%B0%A5%EA%B1%B8%E3%83%9A%E3%83%9CN2JWdnNzkyqvutQ1OZeKUw==.couch
The Base64 (filename variant) of the MD5 is appended to deal with case
sensitivity. I haven't investigated using platform-attribute-specific
code, which would allow filenames to include Unicode characters if the
OS supports that and therefore be much shorter. Presuming that files
aren't intended to be portable between systems. Note that filenames
for ascii names don't look nearly as ugly - not that I consider that
to be a problem.
Dealing with view filenames can be done similarly.
However, I think a better solution is something like this:
N2JWdnNzkyqvutQ1OZeKUw==.couchdb/
name # a UTF-8 file containing the name of the database
data # what was previously in the .couchdb file
temp # what was in the .*_temp file
lhxj+E81IP9xm+0ssUSsQ==.viewgroup/
name # a UTF-8 file containing the name of the view
data # what was previously in the .view file
I suggest using the MD5 because it can be computed from the names.
Alternatively they could be simple integers, which IMO would be a
slightly better solution, but a more pervasive change because most of
the functions currently take names. Using integers would avoid even
the vanishingly small chance of collision.
I know the database name is in the data file, but all of the code
requires the name before reading the file, and changing that would be
a major patch. Furthermore, having the name accessible ensures that
sysadmin tasks are still easy (and scriptable). I think this is a
better system than the current one because filesystem containment is
used rather than filename composition e.g. a database is entirely
contained in a directory.
Apart from the 'name' files, this is a largely mechanistic change.
Opinions?
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
The intuitive mind is a sacred gift and the rational mind is a
faithful servant. We have created a society that honours the servant
and has forgotten the gift.
-- Albert Einstein