Date: Mon, 15 Sep 2025 09:51:01 -0400 From: Chet Ramey <chet.ra...@case.edu> Message-ID: <584c7249-b6a5-430e-9ce2-46e4a5091...@case.edu>
| At least for sh -- I didn't look at the other utilities -- the standard is | fairly explicit that <blank> characters delimit tokens, and <blank> is | locale-specific. Yes, of course, all of that is true. The question is which locale is to apply, and nothing I can see in the standard specifies that, and what it says in XCU 3/sh implies (to me at least) that LC_CTYPE doesn't apply to parsing the sh code - which I think is as it should be. It is kind of tempting to infer that the LC_* variables simply apply to everything, but that's clearly not the case ... the sh spec says clearly that LC_CTYPE applies in pattern matching (that's kind of obvious) and also applies for deciding what is a letter (which in sh, I believe, is only relevant to deciding whether a word is a name or not, and makes some sense so applications can create variable names out of local characters, which can be useful, but aside from that it seems it is only to apply to arguments (ie: strings passed to commands as their args) and input files (which would mean what read processes, and more). If it was intended to mean "parsing the script" it would certainly say so. But since existing shells clearly do not use it that way, it would be wrong for the standard to tell applications they can expect that to happen. kre