On 13/12/2008, at 7:40 AM, Damien Katz wrote:

The decision to limit names db and design doc names is a pragmatic one, it simplifies things greatly. CouchDB is full of things that could be better. Patches welcome.

OK, this code now works for me in a client:

  require 'rubygems'
  require 'json'
  require 'couchrest'
  require 'cgi'

  db_name = CGI.escape("Ser vices/new - ∞शछغش갥걸ペボ")
  db = CouchRest.database!("http://localhost:5984/"; + db_name)

  db_name = CGI.escape("ser vices/new - ∞शछغش갥걸ペボ")
  db = CouchRest.database!("http://localhost:5984/"; + db_name)

And this URL in Safari:

  http://127.0.0.1:5984/Ser+vices%2Fnew+-+∞शछغش갥걸ペボ

returns this:

{"db_name":"Ser vices/new - \u221e\u0936\u091b\u063a \u0634\uac25\uac78\u30da\u30dc","doc_count":1,"doc_del_count": 0,"update_seq":1,"purge_seq":0,"compact_running":false,"disk_size": 14365}

The filesystem looks like this:

Ser+vices%2Fnew+-+%E2%88%9E%E0%A4%B6%E0%A4%9B%D8%BA%D8%B4%EA %B0%A5%EA%B1%B8%E3%83%9A%E3%83%9C-lhxj+E81IP9xm+0ssUSsQ==.couch ser+vices%2Fnew+-+%E2%88%9E%E0%A4%B6%E0%A4%9B%D8%BA%D8%B4%EA %B0%A5%EA%B1%B8%E3%83%9A%E3%83%9CN2JWdnNzkyqvutQ1OZeKUw==.couch

The Base64 (filename variant) of the MD5 is appended to deal with case sensitivity. I haven't investigated using platform-attribute-specific code, which would allow filenames to include Unicode characters if the OS supports that and therefore be much shorter. Presuming that files aren't intended to be portable between systems. Note that filenames for ascii names don't look nearly as ugly - not that I consider that to be a problem.

Dealing with view filenames can be done similarly.

However, I think a better solution is something like this:

  N2JWdnNzkyqvutQ1OZeKUw==.couchdb/
    name     # a UTF-8 file containing the name of the database
    data     # what was previously in the .couchdb file
    temp     # what was in the .*_temp file
    lhxj+E81IP9xm+0ssUSsQ==.viewgroup/
      name     # a UTF-8 file containing the name of the view
      data     # what was previously in the .view file

I suggest using the MD5 because it can be computed from the names. Alternatively they could be simple integers, which IMO would be a slightly better solution, but a more pervasive change because most of the functions currently take names. Using integers would avoid even the vanishingly small chance of collision.

I know the database name is in the data file, but all of the code requires the name before reading the file, and changing that would be a major patch. Furthermore, having the name accessible ensures that sysadmin tasks are still easy (and scriptable). I think this is a better system than the current one because filesystem containment is used rather than filename composition e.g. a database is entirely contained in a directory.

Apart from the 'name' files, this is a largely mechanistic change.

Opinions?

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The intuitive mind is a sacred gift and the rational mind is a faithful servant. We have created a society that honours the servant and has forgotten the gift.
  -- Albert Einstein


Reply via email to