Re: indent mangles UTF-8

Petr Pisar Mon, 27 Mar 2023 01:53:49 -0700

V Fri, Mar 24, 2023 at 11:01:04AM -0700, Adam Wozniak napsal(a):
> Using "indent" on a C file with structure members with UTF-8 names (as
> allowed under C99 and later).
> 
> indent completely mangles these member names, inserting spaces between UTF8
> bytes.
> 
> -double ə14(double GST, struct φλ φλ) {


C99 leaves Unicode characters in identifiers as an implementation-defined
option:

    An implementation may allow multibyte characters that are not part of the
    basic source character set to appear in identifiers; which characters and
    their correspondence to universal character names is
    implementation-defined.

You probably mistaken Unicode characters with Unicode character names
(a sequence like \uNNNN and \UNNNNNNNN):

    Universal character names may be used in identifiers, character constants,
    and string literals to designate characters that are not in the basic
    character set.

Hence C99-conforming compiler must support:

    double ə14(double GST, struct \u03c6\u03bb \u03c6\u03bb);

but may support:

    double ə14(double GST, struct φλ φλ);

while the interoperbility of the latter (e.g. linking to compilation units
together) is completely unspecified.

I don't say that cindent could not support Unicode characters (probably
depending on a locale because indent needs understand them to align columns
properly). Only that your claim about UTF-8 support in C99 is misleading.

-- Petr

signature.asc
Description: PGP signature

Re: indent mangles UTF-8

Reply via email to