Re: how do to cmd subst with trailing newlines portable (was: does POSIX mandate whether the output…)

Christoph Anton Mitterer via austin-group-l at The Open Group Mon, 07 Feb 2022 21:54:36 -0800

Hey.

I'm afraid but some more questions came up on my side:



1) POSIX says:
"The encoded values associated with <period>, <slash>, <newline>, and
<carriage-return> shall be invariant across all locales supported by
the implementation."

When now, for example, <period> is encoded as the byte 0x2E ... the
consequence would be that it had to be 0x2E in all locales and their
encodings, right?

Doesn't that also mean that POSIX effectively forbids UTF16 or UTF32
and actually any >1-byte fixed-encoding?
Cause there it would have to be "padded" with 0x00?




2) When I have a shell script in some encoding, and it contains e.g.:
  printf '.'
would POSIX demand that this:
a) always cause the byte 0x2E to be printed
b) print the character 'x' according to the currently set locale, e.g.
   if that was using UTF16, it would print the bytes 0x2e 0x00
c) print the character 'x' according to the locale in which the shell
   parses the script (but there again, if it was UTF16... the bytes
   0x2e 0x00)
d) Would it in some weird encodings like IBM905 cause the byte 0x4B to
   be printed?

?




3) With respect to the command substitution with trailing newlines
question:

Because of (2) ... would it be in any way safer to e.g.
  printf '\056'
(octal for . in ASCII/etc.)
and also strip that off... rather than using '.'?

Especially also with respect to a hypothetical UTF16/32 locale?




4) Doesn't strictly belong here, but maybe someone knows:
On my Debian (=> glibc) I was trying this:
/usr/share/i18n/charmaps$ zgrep "[xX]2[eEfF]" * | grep -Ev 
'[[:space:]](SOLIDUS|FULL STOP)$'

i.e. searching for any entries that are 0x2E or 0x2f ( . and / ),
filtering out any who really are considered as that.

That gave quite some matches:
BRF.gz:<U2828>     /x2e         BRAILLE PATTERN DOTS-46
BRF.gz:<U280C>     /x2f         BRAILLE PATTERN DOTS-34
EBCDIC-AT-DE-A.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-AT-DE-A.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-AT-DE.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-AT-DE.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-CA-FR.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-CA-FR.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-DK-NO-A.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-DK-NO-A.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-DK-NO.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-DK-NO.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-ES-A.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-ES-A.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-ES.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-ES.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-ES-S.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-ES-S.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-FI-SE-A.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-FI-SE-A.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-FI-SE.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-FI-SE.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-FR.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-FR.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-IT.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-IT.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-PT.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-PT.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-UK.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-UK.gz:<U0007>     /x2f         BELL (BEL)
EBCDIC-US.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
EBCDIC-US.gz:<U0007>     /x2f         BELL (BEL)
IBM037.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM037.gz:<U0007>     /x2f         BELL (BEL)
IBM038.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM038.gz:<U0007>     /x2f         BELL (BEL)
IBM1026.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM1026.gz:<U0007>     /x2f         BELL (BEL)
IBM1047.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM1047.gz:<U0007>     /x2f         BELL (BEL)
IBM1132.gz:<U0006>     /x2e         <control>
IBM1132.gz:<U0007>     /x2f         <control>
IBM1160.gz:<U0006>     /x2e         <control>
IBM1160.gz:<U0007>     /x2f         <control>
IBM1164.gz:<U0006>     /x2e         <control>
IBM1164.gz:<U0007>     /x2f         <control>
IBM256.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM256.gz:<U0007>     /x2f         BELL (BEL)
IBM273.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM273.gz:<U0007>     /x2f         BELL (BEL)
IBM274.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM274.gz:<U0007>     /x2f         BELL (BEL)
IBM275.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM275.gz:<U0007>     /x2f         BELL (BEL)
IBM277.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM277.gz:<U0007>     /x2f         BELL (BEL)
IBM278.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM278.gz:<U0007>     /x2f         BELL (BEL)
IBM280.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM280.gz:<U0007>     /x2f         BELL (BEL)
IBM281.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM281.gz:<U0007>     /x2f         BELL (BEL)
IBM284.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM284.gz:<U0007>     /x2f         BELL (BEL)
IBM285.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM285.gz:<U0007>     /x2f         BELL (BEL)
IBM290.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM290.gz:<U0007>     /x2f         BELL (BEL)
IBM297.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM297.gz:<U0007>     /x2f         BELL (BEL)
IBM420.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM420.gz:<U0007>     /x2f         BELL (BEL)
IBM423.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM423.gz:<U0007>     /x2f         BELL (BEL)
IBM424.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM424.gz:<U0007>     /x2f         BELL (BEL)
IBM500.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM500.gz:<U0007>     /x2f         BELL (BEL)
IBM870.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM870.gz:<U0007>     /x2f         BELL (BEL)
IBM871.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM871.gz:<U0007>     /x2f         BELL (BEL)
IBM875.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM875.gz:<U0007>     /x2f         BELL (BEL)
IBM880.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM880.gz:<U0007>     /x2f         BELL (BEL)
IBM905.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM905.gz:<U0007>     /x2f         BELL (BEL)
IBM918.gz:<U0006>     /x2e         ACKNOWLEDGE (ACK)
IBM918.gz:<U0007>     /x2f         BELL (BEL)
INIS-CYRILLIC.gz:<U2192>     /x2e         RIGHTWARDS ARROW
INIS-CYRILLIC.gz:<U222B>     /x2f         INTEGRAL
ISO_10646.gz:<I;>       /x01/x2E        LATIN CAPITAL LETTER I WITH OGONEK
ISO_10646.gz:<i;>       /x01/x2F        LATIN SMALL LETTER I WITH OGONEK
ISO_10646.gz:<JU>       /x04/x2E        CYRILLIC CAPITAL LETTER YU
ISO_10646.gz:<JA>       /x04/x2F        CYRILLIC CAPITAL LETTER YA
ISO_10646.gz:<x+>       /x06/x2E        ARABIC LETTER KHAH
ISO_10646.gz:<d+>       /x06/x2F        ARABIC LETTER DAL
ISO_10646.gz:<I:'>      /x1E/x2E        LATIN CAPITAL LETTER I WITH DIAERESIS 
AND ACUTE
ISO_10646.gz:<i:'>      /x1E/x2F        LATIN SMALL LETTER I WITH DIAERESIS AND 
ACUTE
ISO_10646.gz:<Io>       /x22/x2E        CONTOUR INTEGRAL
ISO_10646.gz:<dlR>      /x25/x2E        BOX DRAWINGS RIGHT HEAVY AND LEFT DOWN 
LIGHT
ISO_10646.gz:<dH->      /x25/x2F        BOX DRAWINGS DOWN LIGHT AND HORIZONTAL 
HEAVY
ISO_11548-1.gz:<U282E>     /x2e BRAILLE PATTERN DOTS-2346
ISO_11548-1.gz:<U282F>     /x2f BRAILLE PATTERN DOTS-12346
JIS_C6220-1969-JP.gz:<YO>                   /x2E   <U30E7> KATAKANA LETTER 
SMALL YO
JIS_C6220-1969-JP.gz:<TU>                   /x2F   <U30C3> KATAKANA LETTER 
SMALL TU

Since all these (well except perhaps ISO_10646) use 0x2E and 0x2F for
other characters than . and /  ... doesn't that already mean that
they're invalid with respect to POSIX?


Thanks,
Chris.

Re: how do to cmd subst with trailing newlines portable (was: does POSIX mandate whether the output…)

Reply via email to