Am 26.11.2013 um 22:53 schrieb Arthur O'Dwyer <[email protected]>:

> I propose that it would be a very good idea for the next standard to
> provide format specifiers for char16_t and char32_t. I would nominate
> "%hc" (char16_t) and "%Lc" (char32_t), and the matching "%hs" (array
> of char16_t) and "%Ls" (array of char32_t).

This alone would not solve the problem. C11 has the issue that it is 
recommended to use Unicode for char16_t and char32_t, but not required and that 
implementors are free to use another encoding.

So, to really fix this, C1y would need to require Unicode, like C++11 did (no 
idea why C++11 got it right and they screwed it up in C11 after copying 
char{16,32}_t over.

The idea is that in the meantime, I do the same Apple does: In order to have a 
format string as an object, it needs special handling anyway. So I want to 
introduce the new format string type __OFString__ which takes an OFString 
object as the format string. I need that anyway, no matter what the outcome of 
this.

Now that I need my own format string type anyway, I don't see a reason not to 
do the same as Apple: Interpret %C and %S differently if the format string is 
an OFString. Apple does *exactly* the same. They special case it to unichar / 
const unichar*, I special case it to of_unichar_t / const of_unichar_t*.

This does not hurt anybody, as it does not modify any existing behaviour, but 
instead introduces a new format string type with new behaviour. This is 
completely independent from the shortcomings of the standard and I'd *really* 
like to get this in. I need __OFString__ as a format string type anyway, so 
while I'm at it, I don't see any problem with doing the same special casing 
Apple does.

While I do map of_unichar_t to C(++)'s char32_t, that does not mean it is the 
same as char32_t. char32_t is not required to be Unicode - of_unichar_t is. So 
if C1y introduces a length modifier for char32_t, it would still not be the 
same: If the system does not use Unicode for char32_t, printf would convert 
this non-Unicode encoding to whatever multibyte encoding is used for the 
current locale. So if you put a Unicode character in a char32_t on these 
systems, it will go wrong.

With of_unichar_t OTOH, I *require* it to be Unicode. Thus I can always assume 
it is Unicode and convert it to the right multibyte encoding.

So, IMHO, if you really want to fix the standard and do it without any 
extensions (this could take years, so please, if you are for a standard fix, 
consider my patch nonetheless), the following would be needed:

* Require char16_t and char32_t to be Unicode (like C++11 does)
** Not required by me, but required to do it right: Require that an array of 
char16_t may contain UTF-16, so that it is correctly converted to the required 
multibyte encoding
* Add a length modifier for char16_t / char16_t array / char32_t / char32_t 
array
** The length modifier for char16_t array should accept UTF-16

And ideally, it should also add the other wchar_t functions for char{16,32}_t - 
I never got why they were omitted.

But, again, all this will take years. So please, let me just do the same thing 
for my framework that Apple does for theirs. This worked well for them for 
years, and it does work well for me too. It will not hurt anybody, will not 
interfere with anything else and will make me and the users of my framework 
happy ;).

Thanks.

--
Jonathan

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to