On 06/02/2021 23:38, Robert Elz via austin-group-l at The Open Group wrote:
     Date:        Sat, 06 Feb 2021 21:55:19 +0100
     From:        Steffen Nurpmeso <stef...@sdaoden.eu>
     Message-ID:  <20210206205519.43rln%stef...@sdaoden.eu>

   | Fiddling with bytes is something completely different.

But how is the shell supposed to know?

Consider
        U1=$'\u021c'
        U2=$'\u0a47'
        X1=$'\310\234'
        X2=$'\340\251\207'
[...]
Ignoring the bit about converting to other replacement chars, here,
since I'm concerned with valid codepoints only, I don't think the
shell should be converting this kind of thing via iconv() ... utilities
might (including built-ins in sh, like echo or printf) but not the
shell itself.  In the above (assuming I did the conversions correctly)
it should always be the case that $U1 = $X1 and $U1 = $X2, regardless
of any locale settings.  If I cannot assume that when writing a script
then I have no idea how I would ever do anything with non-ascii chars
reliably.

bash, ksh and zsh, all of which support $'\u....', do convert the Unicode code point to the current locale, and I support this and implemented the same in my shell. For \u sequences that ask for a Unicode code point that is not representable in the current locale, the \u sequence is left unconverted (bash, ksh, my shell) or causes the shell to report an error (zsh).

This is useful for scripts that aim to work in a limited selection of locales and know that certain characters are valid in all the supported locales, but are not encoded the same way in all of them. If they want to print a Euro symbol, for instance, they can write

  echo $'\u20AC'

and be assured it works everywhere the Euro symbol is supported.

If they instead write

  echo '€'

where the script is saved as UTF-8, the script will needlessly break when it is run in an ISO-8859-15 environment.

Cheers,
Harald van Dijk

    • Re: [10... Robert Elz via austin-group-l at The Open Group
      • Re:... Stephane Chazelas via austin-group-l at The Open Group
      • Re:... Steffen Nurpmeso via austin-group-l at The Open Group
      • Re:... Robert Elz via austin-group-l at The Open Group
        • ... Steffen Nurpmeso via austin-group-l at The Open Group
        • ... Robert Elz via austin-group-l at The Open Group
          • ... Steffen Nurpmeso via austin-group-l at The Open Group
          • ... Robert Elz via austin-group-l at The Open Group
            • ... Steffen Nurpmeso via austin-group-l at The Open Group
            • ... Robert Elz via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Steffen Nurpmeso via austin-group-l at The Open Group
  • [1003.1(2008... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2008... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2008... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2008... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2008... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2008... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2008... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2008... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2008... Austin Group Bug Tracker via austin-group-l at The Open Group

Reply via email to