On 03/21/2018 06:21 PM, Ingo Schwarze wrote:
Hi Karl,

following your observations, i just rewrote the setlocale(3)
manual page and comitted the new version.

Karl Williamson wrote on Mon, Mar 19, 2018 at 11:51:31AM -0600:

But your man page doesn't describe any of this.  It doesn't say that
UTF-8 is a legal locale, for example.

Fixed.

It does say that LC_CTYPE is the
only category that can be other than C or POSIX, but it doesn't say the
only other possible one is UTF-8.

Fixed.

I think it should.  If your replies
to me were slightly repackaged and placed into the man page, that would
help a lot.

I still believe that in my program the setlocale() returning C for
LC_ALL is a bug.

I agree, that is a bug in my code.  I don't have a patch yet,
but i will write one, it cannot be difficult.  I doubt that it
will go in before release, though.  The fact that the bug went
undetected for many months shows that it is not release-critical,
and we are now in a phase where we want to weed out critical bugs
rather than risk introducing new ones.

We also are fast approaching that stage in our release cycle; so I totally understand.


I don't know what would happen if one were to call setlocale(LC_ALL,
"ro_RO.UTF-8");

As expected, it sets the whole locale to "ro_RO.UTF-8"
and returns "ro_RO.UTF-8".

BTW, There is some variance actually in real UTF-8 locales, which you
may not have considered.  Unicode, contrary to their claims, is not
completely locale-independent in LC_CTYPE.  Some Turkish locales that
are UTF-8 use alternate casing rules for the dotless and dotted i
characters.

I'm aware of that, but we will not support it, making the character
properties language-dependent is excessive complexity.  It is safer
and results in more predictable program behavious if every character
has a well-defined, constant set of properties.  KISS and the
principle of least surprise are key in this respect.

And some, especially earlier, UTF-8 locales consider various ASCII
characters that are mandated by POSIX to be ispunct() to not be
punctuation.

Not gonna happen on OpenBSD.  Over my dead body.  We won't change
ASCII, and we *will* make sure that Unicode is treated as a strict
superset of ASCII.  What ASCII (or more precisely, the C locale)
defines, Unicode is not free to change.

As I said, it was mostly older locale definitions that did this. And newer ones have changed to not do this. So others have learned that this is the wrong approach.

Yours,
   Ingo


Thanks for your attention to this matter.

Karl Williamson

Reply via email to