Date:        Mon, 15 Sep 2025 09:51:01 -0400
    From:        Chet Ramey <chet.ra...@case.edu>
    Message-ID:  <584c7249-b6a5-430e-9ce2-46e4a5091...@case.edu>


  | At least for sh -- I didn't look at the other utilities -- the standard is
  | fairly explicit that <blank> characters delimit tokens, and <blank> is
  | locale-specific.

Yes, of course, all of that is true.   The question is which locale is
to apply, and nothing I can see in the standard specifies that, and what
it says in XCU 3/sh implies (to me at least) that LC_CTYPE doesn't apply
to parsing the sh code - which I think is as it should be.

It is kind of tempting to infer that the LC_* variables simply apply to
everything, but that's clearly not the case ... the sh spec says clearly
that LC_CTYPE applies in pattern matching (that's kind of obvious) and
also applies for deciding what is a letter (which in sh, I believe, is
only relevant to deciding whether a word is a name or not, and makes some
sense so applications can create variable names out of local characters,
which can be useful, but aside from that it seems it is only to apply to
arguments (ie: strings passed to commands as their args) and input files
(which would mean what read processes, and more).

If it was intended to mean "parsing the script" it would certainly say so.

But since existing shells clearly do not use it that way, it would be
wrong for the standard to tell applications they can expect that to happen.

kre


Reply via email to