Re: POSIX msgfmt and universal-character-name escape sequences

Geoff Clare via austin-group-l at The Open Group Mon, 27 Jun 2022 09:35:01 -0700

Bruno Haible wrote, on 23 Jun 2022:
>
> https://posix.rhansen.org/p/gettext_draft
> Line 1031
> 
> "except that universal-character-name escape sequences need not be supported."
> 
> Neither GNU msgfmt nor Solaris msgfmt treat universal-character-name
> escape sequences specially. If an msgstr contains e.g. "\\u20AC", the
> resulting string in the .mo file is
> { '\\', 'u', '2', '0', 'A', 'C', '\0' }.
> 
> Issue: Leaving it undefined whether \u escape sequences are recognized can
> lead to mutual incompatibility of msgfmt implementations: Implementations
> would differ in their interpretation of the dot-po file.
> 
> There is no good reason for leaving it undefined: There is already a
> mechanism for specifying an encoding (charset=... in the header), and the
> UTF-8 encoding is in widespread use for more than 10 years.


In today's teleconference we discussed this and formulated the following
response...

    If a C17 source file contains calls to gettext family functions
    that pass string literals containing \u sequences, xgettext will
    write those strings literals to the .po file. It would be a useful
    future enhancement to msgfmt if it could support these sequences.
    We don't want POSIX to forbid this enhancement, as it is possible
    it will be requested by users during the lifetime of the next
    POSIX revision.

-- 
Geoff Clare <g.cl...@opengroup.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

Re: POSIX msgfmt and universal-character-name escape sequences

Reply via email to