On Thursday 13 of October 2011 13:15:57 slash wrote:
> I have some files on an external ext2 drive that have whitespace and
> umlauts (ä, ö) in them. trfs took care of the whitespace. But ext2srv
> presents umlauts as a question mark symbol (�) and won't let me access
> the file (error: file does not exist).

i believe -- but i am not sure! -- that linux stores and reads names on 
ext2/3/4 without any conversion between filesystem and I/O syscalls like 
open(). if you have iso8859-1 or similar single-byte locale on linux, your 
ext2 contains iso8859-1 encoded filenames.

to the contrary, for thos filesystems that always store file names in UTF-16 or 
similar (NTFS, FAT32 with LFN, Jolliet extension of ISO9660 etc.), there's 
`iocharset' mount option that converts between on-disk UTF-16 and I/O syscalls 
like open(). normally you set it to match your locale settings. but for 
ext2/3/4, anything goes literally, literally.

you'd need to convert the pathnames, either one-time on disk or upon every r/o 
access (yuck!).

it may be sensible to use only UTF8 locale on linux, like LANG=en_US.utf8, but 
that'll not update names stored in ext2/3/4 filesystem automagically. it's just 
about interpretation.

again, that's what i believe, but i dunno how to verify that. any ideas?

-- 
dexen deVries

[[[↓][→]]]

http://xkcd.com/732/

Reply via email to