2016-11-30 18:37:05 -0800, Paul Eggert: > On 11/30/2016 03:30 AM, Stephane Chazelas wrote: > >That can also be seen as a POSIX conformance bug > > Not really, as POSIX does not require support for UTF-8 (except in > the pax utility, which is not part of coreutils). [...]
POSIX does not require support for any charset. It only specifies one locale (C/POSIX), doesn't specify the charset in that locale other than it should be a single byte charset that covers the portable character set. Examples of such charsets are ASCII, iso8859-x or EBCDIC. In practice, that tends to be ASCII (except for some rare EBCDIC based IBM systems) as tha But it does support a localisation API and allows system to support other locales with other charsets. That API does support multi-byte encodings, including stateful ones (though how they are /defined/ is implementation defined for lock-shift ones and in practice those are unworkable so I'd expect those would eventually be removed from the standard). It doesn't require compliant systems to have locales with multi-byte character sets, but if they have (if they show up in the output of locale -a), then they have to be supported throughout (as specified, for all the utilities for instance). Basically, on systems that have locales with multi-byte encodings --UTF-8 or other-- (most Unix-like ones including GNU systems like Debian), GNU pr (and many other GNU utilities) is not POSIX compliant. See http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/basedefs/V1_chap06.html for details. -- Stephane