On 28/01/2022 01:48, Christoph Anton Mitterer wrote:
On Thu, 2022-01-27 at 15:18 +0000, Harald van Dijk via austin-group-l
at The Open Group wrote:
The benefit of this that when the
shell's locale changes, variables still hold their original text (as
opposed to their original bytes).
But doesn't that by itself already violate POSIX?
There is "2.5.3 Shell Variables", which AFAIU says that setting
LANG/LC_* must take effect during the shell runtime.
The way it works in yash, it does take effect at runtime, just not in
the same way it does in other shells.
LC_CTYPE says:
"Determine the interpretation of sequences of bytes of text data as
characters (for example, single-byte as opposed to multi-byte
characters), which characters are defined as letters (character class
alpha) and <blank> characters (character class blank), and the behavior
of character classes within pattern matching. Changing the value of
LC_CTYPE after the shell has started shall not affect the lexical
processing of shell commands in the current shell execution environment
or its subshells. Invoking a shell script or performing exec sh
subjects the new shell to the changes in LC_CTYPE."
=> lexical scanning of the current script stays
=> everything else, changes
including e.g. printf, or things like ${#var}, ${var##}, etc.
Right?
${#var}, ${var##}, etc. are supposed to work at the character level. In
other shells that internally hold values as byte strings, yes, LC_CTYPE
needs to be considered here. What the other shells effectively do is
convert to a wide string (not necessarily holding the full wide string
in memory). yash normally converts the wide strings to multibyte strings
as needed. Here, that would mean converting the wide string to a
multibyte string and immediately back a wide string again, which can be
optimised by just acting on the wide string directly. That said...
So if the shell would keep holding it's original text/characters.. this
wouldn't work (or the shell would need to convert every time)?
...for anything that cannot be done by keeping the strings as wide
strings, including calling any non-builtin command, yash does convert
every time the values are used according to the current locale.
Cheers,
Harald van Dijk