Re: filename and normalization (was gcc identifiers)

Edmund GRIMLEY EVANS Thu, 05 Dec 2002 02:16:48 -0800

I don't think normalisation helps at all.

An ideal UTF-8 terminal should remember the actual octets that were
printed, so you can accurately copy and paste even random binary data
that is displayed as reverse-field question marks.


The ls program should have an option to display file names in a form
in which they can be used as shell arguments and with difficult octet
sequences replaced by numerical escapes.[*]

Those two measures together should make it fairly easy to copy and
paste file names. However, if you add normalisation, it will stop
working.

It might be useful to have a program that looks for a file path on the
system that is similar to a given file path. This program could use
normalisation internally, but it would be better to use a fuzzy
comparison. For example, "guesspath foo" would return "Foo" if the
only files in the current directory are "Foo" and "Bar", but it would
return "foo" if there is a file called "foo", and I don't know what it
would do if there are files called "foo " and "Foo".

Edmund

[*] Unfortunately, the Bourne shell doesn't have numerical escapes,
which rather spoils this plan. You could have a file called "\007"
displayed as "$(printf "\x07")", while a file called "$(printf
\"\\x07\")" is displayed as '$(printf "\x07")', etc.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: filename and normalization (was gcc identifiers)

Reply via email to