On Thursday 13 of October 2011 13:15:57 slash wrote: > I have some files on an external ext2 drive that have whitespace and > umlauts (ä, ö) in them. trfs took care of the whitespace. But ext2srv > presents umlauts as a question mark symbol (�) and won't let me access > the file (error: file does not exist).
i believe -- but i am not sure! -- that linux stores and reads names on ext2/3/4 without any conversion between filesystem and I/O syscalls like open(). if you have iso8859-1 or similar single-byte locale on linux, your ext2 contains iso8859-1 encoded filenames. to the contrary, for thos filesystems that always store file names in UTF-16 or similar (NTFS, FAT32 with LFN, Jolliet extension of ISO9660 etc.), there's `iocharset' mount option that converts between on-disk UTF-16 and I/O syscalls like open(). normally you set it to match your locale settings. but for ext2/3/4, anything goes literally, literally. you'd need to convert the pathnames, either one-time on disk or upon every r/o access (yuck!). it may be sensible to use only UTF8 locale on linux, like LANG=en_US.utf8, but that'll not update names stored in ext2/3/4 filesystem automagically. it's just about interpretation. again, that's what i believe, but i dunno how to verify that. any ideas? -- dexen deVries [[[↓][→]]] http://xkcd.com/732/
