On 27/01/2022 17:43, Geoff Clare via austin-group-l at The Open Group wrote:
Harald van Dijk wrote, on 27 Jan 2022:
On 27/01/2022 12:44, Geoff Clare via austin-group-l at The Open Group wrote:
Christoph Anton Mitterer wrote, on 26 Jan 2022:
3) Does POSIX define anywhere which values a shell variable is required
to be able to store?
I only found that NUL is excluded, but that alone doesn't mean that
any other byte value is required to work.
Kind of circular, but POSIX clearly requires that a variable can be
assigned any value obtained from a command substitution that does not
include a NUL byte, and specifies utilities that can be used to
generate arbitrary byte values, therefore a variable can contain any
sequence of bytes that does not include a NUL byte.
Is it really clear that POSIX requires that? The fact that it refers to
"characters" of the output implies the bytes need to be interpreted as
characters according to the current locale, which is a process that can
fail.
The only relevant uses of "character" I can see are part of the
phrase "<newline> character". Since <newline> is required to be
a single-byte character, and the byte that encodes it is not allowed
to be part of any other character, changing the text to "<newline> byte"
would not make any difference to the requirements.
I have to disagree. The use of "<newline> character" to me clearly means
that the output of the command is processed as a sequence of characters,
as opposed to a sequence of bytes. Implementations may (and likely will)
implement this by treating it as a sequence of bytes when they can prove
that this is equivalent, but that is an optimisation, not what POSIX
specifies. In a UTF-8 locale, if a command outputs the bytes 0x80 and
0x0A, does it end in a <newline> character? I say it is the same as
asking if it ends in the M_PI constant: we are not dealing with a
sequence of floating-point values, so the question does not make sense
and has no answer.
Cheers,
Harald van Dijk