Am 26.11.2013 um 22:53 schrieb Arthur O'Dwyer <[email protected]>:
> I propose that it would be a very good idea for the next standard to
> provide format specifiers for char16_t and char32_t. I would nominate
> "%hc" (char16_t) and "%Lc" (char32_t), and the matching "%hs" (array
> of char16_t) and "%Ls" (array of char32_t).
This alone would not solve the problem. C11 has the issue that it is
recommended to use Unicode for char16_t and char32_t, but not required and that
implementors are free to use another encoding.
So, to really fix this, C1y would need to require Unicode, like C++11 did (no
idea why C++11 got it right and they screwed it up in C11 after copying
char{16,32}_t over.
The idea is that in the meantime, I do the same Apple does: In order to have a
format string as an object, it needs special handling anyway. So I want to
introduce the new format string type __OFString__ which takes an OFString
object as the format string. I need that anyway, no matter what the outcome of
this.
Now that I need my own format string type anyway, I don't see a reason not to
do the same as Apple: Interpret %C and %S differently if the format string is
an OFString. Apple does *exactly* the same. They special case it to unichar /
const unichar*, I special case it to of_unichar_t / const of_unichar_t*.
This does not hurt anybody, as it does not modify any existing behaviour, but
instead introduces a new format string type with new behaviour. This is
completely independent from the shortcomings of the standard and I'd *really*
like to get this in. I need __OFString__ as a format string type anyway, so
while I'm at it, I don't see any problem with doing the same special casing
Apple does.
While I do map of_unichar_t to C(++)'s char32_t, that does not mean it is the
same as char32_t. char32_t is not required to be Unicode - of_unichar_t is. So
if C1y introduces a length modifier for char32_t, it would still not be the
same: If the system does not use Unicode for char32_t, printf would convert
this non-Unicode encoding to whatever multibyte encoding is used for the
current locale. So if you put a Unicode character in a char32_t on these
systems, it will go wrong.
With of_unichar_t OTOH, I *require* it to be Unicode. Thus I can always assume
it is Unicode and convert it to the right multibyte encoding.
So, IMHO, if you really want to fix the standard and do it without any
extensions (this could take years, so please, if you are for a standard fix,
consider my patch nonetheless), the following would be needed:
* Require char16_t and char32_t to be Unicode (like C++11 does)
** Not required by me, but required to do it right: Require that an array of
char16_t may contain UTF-16, so that it is correctly converted to the required
multibyte encoding
* Add a length modifier for char16_t / char16_t array / char32_t / char32_t
array
** The length modifier for char16_t array should accept UTF-16
And ideally, it should also add the other wchar_t functions for char{16,32}_t -
I never got why they were omitted.
But, again, all this will take years. So please, let me just do the same thing
for my framework that Apple does for theirs. This worked well for them for
years, and it does work well for me too. It will not hurt anybody, will not
interfere with anything else and will make me and the users of my framework
happy ;).
Thanks.
--
Jonathan
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
