Re: slash escaping (was 0.9.0 Release)

Antony Blakey Fri, 12 Dec 2008 20:45:12 -0800


On 13/12/2008, at 7:40 AM, Damien Katz wrote:

The decision to limit names db and design doc names is a pragmaticone, it simplifies things greatly. CouchDB is full of things thatcould be better. Patches welcome.


OK, this code now works for me in a client:

  require 'rubygems'
  require 'json'
  require 'couchrest'
  require 'cgi'

  db_name = CGI.escape("Ser vices/new - ∞शछغش갥걸ペボ")
  db = CouchRest.database!("http://localhost:5984/"; + db_name)

  db_name = CGI.escape("ser vices/new - ∞शछغش갥걸ペボ")
  db = CouchRest.database!("http://localhost:5984/"; + db_name)

And this URL in Safari:

  http://127.0.0.1:5984/Ser+vices%2Fnew+-+∞शछغش갥걸ペボ

returns this:

{"db_name":"Ser vices/new - \u221e\u0936\u091b\u063a\u0634\uac25\uac78\u30da\u30dc","doc_count":1,"doc_del_count":0,"update_seq":1,"purge_seq":0,"compact_running":false,"disk_size":14365}


The filesystem looks like this:

Ser+vices%2Fnew+-+%E2%88%9E%E0%A4%B6%E0%A4%9B%D8%BA%D8%B4%EA%B0%A5%EA%B1%B8%E3%83%9A%E3%83%9C-lhxj+E81IP9xm+0ssUSsQ==.couchser+vices%2Fnew+-+%E2%88%9E%E0%A4%B6%E0%A4%9B%D8%BA%D8%B4%EA%B0%A5%EA%B1%B8%E3%83%9A%E3%83%9CN2JWdnNzkyqvutQ1OZeKUw==.couch

The Base64 (filename variant) of the MD5 is appended to deal with casesensitivity. I haven't investigated using platform-attribute-specificcode, which would allow filenames to include Unicode characters if theOS supports that and therefore be much shorter. Presuming that filesaren't intended to be portable between systems. Note that filenamesfor ascii names don't look nearly as ugly - not that I consider thatto be a problem.


Dealing with view filenames can be done similarly.

However, I think a better solution is something like this:

  N2JWdnNzkyqvutQ1OZeKUw==.couchdb/
    name     # a UTF-8 file containing the name of the database
    data     # what was previously in the .couchdb file
    temp     # what was in the .*_temp file
    lhxj+E81IP9xm+0ssUSsQ==.viewgroup/
      name     # a UTF-8 file containing the name of the view
      data     # what was previously in the .view file

I suggest using the MD5 because it can be computed from the names.Alternatively they could be simple integers, which IMO would be aslightly better solution, but a more pervasive change because most ofthe functions currently take names. Using integers would avoid eventhe vanishingly small chance of collision.

I know the database name is in the data file, but all of the coderequires the name before reading the file, and changing that would bea major patch. Furthermore, having the name accessible ensures thatsysadmin tasks are still easy (and scriptable). I think this is abetter system than the current one because filesystem containment isused rather than filename composition e.g. a database is entirelycontained in a directory.


Apart from the 'name' files, this is a largely mechanistic change.

Opinions?

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The intuitive mind is a sacred gift and the rational mind is afaithful servant. We have created a society that honours the servantand has forgotten the gift.

  -- Albert Einstein

Re: slash escaping (was 0.9.0 Release)

Reply via email to