On Thu, 2011-03-31 at 15:58 +0200, Bill Allombert wrote: > Dear Developpers, > > there are a small numbers of packages that ship files with non-7bit > characters in filenames. > $ apt-file search -l -x '[\x80-\xff]' > > aspell-ca > aspell-es > aspell-is > canorus > console-tools > dvb-apps > ggz-python-games > inorwegian > jpilot > lletters-media > otrs2 > wnorwegian > > So this raises two issues: > 1) should non-7bit characters in filenames be allowed > 2) if yes whould we require the filename to be in a correct UTF-8 encoding ? > > I raise the question because I was trying to filter out popcon reports that > include > non-7bit characters since it usually implies corruption of data, but this > might not be the > case. > > Also, it seems there is a tool out there that generate .deb packages with > names like > designkit.702840f10216893fc3494b731e825b33666733d6.1 > and filename that are all non-7bit. (probably in Japanese).
I think we should definitely *not* forbid this, and we should (at the
very least) be working towards supporting the practice.
It may be that we can't properly support this until we can guarantee a
C.UTF-8 locale as a minimum available standard, but that sounds to me
like another justification for such a locale.
I think we should encourage the filename to be in a UTF-8 encoding, and
even if upstream does use 8-bit filenames with a non-UTF-8 encoding I
think that a Debian packager should be encouraged to patch that.
I would also be OK with mandating that filenames should only be in
either UTF-8 or the ASCII subset thereof, and that ISO-8859-* and other
such restricted measures are not welcome on our filesystems.
Regards,
Andrew McMillan.
--
------------------------------------------------------------------------
andrew (AT) morphoss (DOT) com +64(272)DEBIAN
If wishes were horses, then beggars would be thieves.
------------------------------------------------------------------------
signature.asc
Description: This is a digitally signed message part

