A NOTE has been added to this issue. ====================================================================== http://austingroupbugs.net/view.php?id=1122 ====================================================================== Reported By: joerg Assigned To: ====================================================================== Project: 1003.1(2016)/Issue7+TC2 Issue ID: 1122 Category: System Interfaces Type: Enhancement Request Severity: Editorial Priority: normal Status: New Name: Jörg Schilling Organization: User Reference: Section: 3 + 4 Page Number: 1102...and others Line Number: somewhere in section 3 and 4 Interp Status: --- Final Accepted Text: ====================================================================== Date Submitted: 2017-02-28 16:51 UTC Last Modified: 2017-03-03 04:42 UTC ====================================================================== Summary: POSIX should include gettext() and friends ======================================================================
---------------------------------------------------------------------- (0003588) shware_systems (reporter) - 2017-03-03 04:42 http://austingroupbugs.net/view.php?id=1122#c3588 ---------------------------------------------------------------------- Re: 3585 I see it supports using UTF8 text when the locale specifies UTF8 as the charset, for C programs. I do not see it supports u8"", or u"" strings, where it's supposed to be UTF8 or wide chars, whatever the locale. Without a separate way to identify these in the po and mo files, gettext() will treat them as being of the .mo file's locale on read. I don't see any handling of \u or \U escapes in regular strings either, as a separate issue, for conversion to UTF8 or the locale's charset. The examples assume any UTF or UCS values will be specified as octal escapes instead. For wide strings in an .po file, if one is generated manually, reading it as chars using the equivalent of getline() or fscanf("%s") will terminate on the first code with a high byte of 0, or is a multiple of 256. The gettext utilities will not put wide chars in a .mo file anyways (see 10.3), even though the format uses counted strings. For atomic R2L strings the .po format is fine, where all the chars have R2L or neutral directionality, or T2B or neutral for Asian languages. It's when you have strings that need the BiDi codes to override a charmap's assumed directionality that it has problems, say in a text in one language describing another language. Ecma's TR-053 covers the issues involved from the ISO-6429 perspective; Unicode discusses it related to the Bidi_Class property (see UAX9-35, #3.1.2, for examples) for each code point. EBCDIC has the hooks to support BiDi also, for that matter, but how much it does is still undocumented publicly. Issue History Date Modified Username Field Change ====================================================================== 2017-02-28 16:51 joerg New Issue 2017-02-28 16:51 joerg Name => Jörg Schilling 2017-02-28 16:51 joerg Section => 3 + 4 2017-02-28 16:51 joerg Page Number => 1102...and others 2017-02-28 16:51 joerg Line Number => somewhere in section 3 and 4 2017-03-01 16:05 steffen Note Added: 0003575 2017-03-01 16:54 shware_systems Note Added: 0003576 2017-03-01 17:10 joerg Note Added: 0003577 2017-03-01 17:10 joerg Note Edited: 0003577 2017-03-01 17:11 steffen Note Added: 0003578 2017-03-01 17:13 steffen Note Added: 0003579 2017-03-01 17:23 joerg Note Added: 0003580 2017-03-01 17:27 joerg Note Edited: 0003580 2017-03-01 18:09 joerg Note Edited: 0003580 2017-03-01 22:37 steffen Note Added: 0003581 2017-03-02 06:39 shware_systems Note Added: 0003582 2017-03-02 09:09 joerg Note Added: 0003583 2017-03-02 09:40 keld Note Added: 0003584 2017-03-02 09:56 keld Note Added: 0003585 2017-03-02 14:41 steffen Note Added: 0003586 2017-03-02 17:01 keld Note Added: 0003587 2017-03-03 04:42 shware_systems Note Added: 0003588 ======================================================================