Bruno Haible wrote, on 23 Jun 2022: > > https://posix.rhansen.org/p/gettext_draft > Line 1031 > > "except that universal-character-name escape sequences need not be supported." > > Neither GNU msgfmt nor Solaris msgfmt treat universal-character-name > escape sequences specially. If an msgstr contains e.g. "\\u20AC", the > resulting string in the .mo file is > { '\\', 'u', '2', '0', 'A', 'C', '\0' }. > > Issue: Leaving it undefined whether \u escape sequences are recognized can > lead to mutual incompatibility of msgfmt implementations: Implementations > would differ in their interpretation of the dot-po file. > > There is no good reason for leaving it undefined: There is already a > mechanism for specifying an encoding (charset=... in the header), and the > UTF-8 encoding is in widespread use for more than 10 years.
In today's teleconference we discussed this and formulated the following response... If a C17 source file contains calls to gettext family functions that pass string literals containing \u sequences, xgettext will write those strings literals to the .po file. It would be a useful future enhancement to msgfmt if it could support these sequences. We don't want POSIX to forbid this enhancement, as it is possible it will be requested by users during the lifetime of the next POSIX revision. -- Geoff Clare <g.cl...@opengroup.org> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England