I have to say I always kind of assumed that most filesystems only allowed Latin based characters in the name. I got interested so I asked the guys in the IRC channel about non-latin characters in filenames and someone actually just created a file on ext3 with japanese characters and everythign worked fine.
Someone pasted this link: http://en.wikipedia.org/wiki/Comparison_of_file_systems Reading the table it appears that the biggest concern about filenames is including a NULL byte. Perhaps we're overthinking this whole thing? Maybe we can just write filenames with weird characters and the sysadmin's have to muck around with what happens when they have a design doc with weird characters? Paul On Sun, Dec 14, 2008 at 12:07 AM, Antony Blakey <[email protected]> wrote: > > On 14/12/2008, at 2:47 PM, Chris Anderson wrote: > >> Perhaps your filename scheme could be appended to a slug (based on the >> safe-chars) so that sysadmins could still use meaningful file globs to >> eg batch rsync .couch files and view directories. > > The filename encoder can use any scheme, so yes that is trivial. It would > only be (theoretical) a prefix of the readable chars because of length > constraints. Note that there is no guarantee that slugs would be unique. I > considered punycode, but given that it needs to deal with case-insensitive > FS, slashes, limited length, it was simplest to cut to the chase and just > use the digest. > > Regarding your request however, a better way to determine safe-chars > according to the underlying filesystem is required IMO to avoid the overt > roman script-only design. If you think it's essential that *you* can read > the filenames in a terminal, then surely it's essential that a > chinese/russian/greek/swedish/thai etc developer has the same facility. > Otherwise it's not a *design requirement* per se, but rather a preference. > > I'm a pure english speaker myself, but I am about to deploy a couch system > to an asian (government) environment with many millions of users (with, BTW, > a link to CouchDB on every page). In the future I will have to sell this > technology and do technology transfer to local developers - and that is made > very much more difficult with the current vigorously asserted english-only > design decisions because it's a significant political liability. > >> Readability / globbableness is also nice when you're trying to figure >> out which views use the most space on the filesystem, a common task. > > That's why the actual name is in the 'name' file. > > Antony Blakey > ------------- > CTO, Linkuistics Pty Ltd > Ph: 0438 840 787 > > There are two ways of constructing a software design: One way is to make it > so simple that there are obviously no deficiencies, and the other way is to > make it so complicated that there are no obvious deficiencies. > -- C. A. R. Hoare > > >
