I don't think normalisation helps at all. An ideal UTF-8 terminal should remember the actual octets that were printed, so you can accurately copy and paste even random binary data that is displayed as reverse-field question marks.
The ls program should have an option to display file names in a form in which they can be used as shell arguments and with difficult octet sequences replaced by numerical escapes.[*] Those two measures together should make it fairly easy to copy and paste file names. However, if you add normalisation, it will stop working. It might be useful to have a program that looks for a file path on the system that is similar to a given file path. This program could use normalisation internally, but it would be better to use a fuzzy comparison. For example, "guesspath foo" would return "Foo" if the only files in the current directory are "Foo" and "Bar", but it would return "foo" if there is a file called "foo", and I don't know what it would do if there are files called "foo " and "Foo". Edmund [*] Unfortunately, the Bourne shell doesn't have numerical escapes, which rather spoils this plan. You could have a file called "\007" displayed as "$(printf "\x07")", while a file called "$(printf \"\\x07\")" is displayed as '$(printf "\x07")', etc. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
