printf: invalid universal character name

Jim Meyering Sun, 11 May 2008 03:03:05 -0700

Hermann Peifer <[EMAIL PROTECTED]> wrote:
> printf  \uHHHH  is expected to print Unicode chars. This work fine in
> most cases, but  some legal code points are reported as errors: values
> in the ASCII range and C1 control chars, and values between
> U+D800..U+DFFF
>
> I would say that this behaviour is rather a bug than a feature.


Thanks for the report, but this is not some arbitrary restriction,
but rather conformance to the standard (C99, ISO/IEC 10646) for
"universal character name" syntax:

  http://www.open-std.org/jtc1/sc22/wg14/www/docs/n717.htm

Here's part of printf.c, with a comment that probably came from
a version of N717:

      /* A universal character name shall not specify a character short
         identifier in the range 00000000 through 00000020, 0000007F through
         0000009F, or 0000D800 through 0000DFFF inclusive. A universal
         character name shall not designate a character in the required
         character set.  */
      if ((uni_value <= 0x9f
           && uni_value != 0x24 && uni_value != 0x40 && uni_value != 0x60)
          || (uni_value >= 0xd800 && uni_value <= 0xdfff))
        error (EXIT_FAILURE, 0, _("invalid universal character name \\%c%0*x"),
               esc_char, (esc_char == 'u' ? 4 : 8), uni_value);

> /usr/bin/printf: invalid universal character name \u0000
> /usr/bin/printf: invalid universal character name \u0001
...

I can understand that you'd find the restriction surprising,
but I wouldn't call it a bug.


_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: /usr/bin/printf: invalid universal character name

Reply via email to