Harald van Dijk wrote, on 27 Jan 2022: > > On 27/01/2022 12:44, Geoff Clare via austin-group-l at The Open Group wrote: > > Christoph Anton Mitterer wrote, on 26 Jan 2022: > > > 3) Does POSIX define anywhere which values a shell variable is required > > > to be able to store? > > > I only found that NUL is excluded, but that alone doesn't mean that > > > any other byte value is required to work. > > > > Kind of circular, but POSIX clearly requires that a variable can be > > assigned any value obtained from a command substitution that does not > > include a NUL byte, and specifies utilities that can be used to > > generate arbitrary byte values, therefore a variable can contain any > > sequence of bytes that does not include a NUL byte. > > Is it really clear that POSIX requires that? The fact that it refers to > "characters" of the output implies the bytes need to be interpreted as > characters according to the current locale, which is a process that can > fail.
The only relevant uses of "character" I can see are part of the phrase "<newline> character". Since <newline> is required to be a single-byte character, and the byte that encodes it is not allowed to be part of any other character, changing the text to "<newline> byte" would not make any difference to the requirements. > In at least one shell (yash), bytes that do not form a valid character > are discarded, which makes sense since yash internally stores variables etc. > as wide strings. The benefit of this that when the shell's locale changes, > variables still hold their original text (as opposed to their original > bytes). However, there is no (standard) way to convert invalid multibyte > characters to invalid wide characters in such a way that they can be > back-converted to the original invalid multibyte characters. Should I take > your comment as saying it is fundamentally invalid for shells to internally > store text as wide strings? If so, it would be useful if POSIX were to > explicitly require this. yash is written based on the POSIX specification; > the fact that they implemented it like this is a clear indication that such > a requirement was not clear at all to them. Perhaps they saw that environment variables have to contain characters and assumed the same applied to shell variables. -- Geoff Clare <[email protected]> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
