> Date: Wed, 15 Feb 2017 21:20:56 +0100 > From: [email protected] > Cc: [email protected] > > > > Most notably, the whole path might cross several mount points, thus > > > the whole path can well have fragments coming from several file systems. > > > > A possible solution would be to decode each mount point's part as it > > is being resolved. > > ...which can only be based on guesswork: there's no reliable info on > the encoding used for that file system (if it's consistent at all).
You could maintain a database of encodings per file system, perhaps user-defined, or derived by some other means. E.g., for volumes that physically reside on Windows or macOS the encoding is pretty much known in advance. > > > I think the only sane way to see a Linux file system path is the way > > > Linux sees it: as a byte string. > > > > This would lose a lot in 99% of use cases. You are, in effect, > > suggesting a "reverse optimization", whereby the majority of use cases > > is punished in favor of a small minority, based on theoretical > > intractability. > > I feel queasy doing some voodoo whithout the application having > a word on it. In the Emacs context it's a bit easier, because in > the "normal" case things are pretty quickly deferred to the user > (usually). Not really, there are a lot of internal operations that access files and directories, and would wreak major havoc if they don't succeed, silently, in the absolute majority of uses. > > > NT has done that too. > > > > Windows can do that because it also transparently translates file > > names to the locale's encoding when files are accessed with ANSI APIs. > > Without such translation, this kind of decision is unwise, IMO. > > I guess (I don't *know*) Windows stores information about the encoding > at file system level (and keeps that consistent). No. At the file system level (for NTFS volumes at least) Windows file names are always UTF-16 encoded, and Windows just "knows" that. Windows converts that to the locale's codepage when you access files via an API that communicates file names encoded in that codepage. (If the conversion fails, you get question marks instead of the characters that couldn't be converted.) > Linux hasn't that, it just keeps out of it. It hasn't even a place > to state the encoding used. Exactly. Which is why forcing a single file-name encoding on Linux/Unix filesystems is IMO a bad idea.
