On 2025-04-28 20:35, Austin Group Issue Tracker via austin-group-l at The Open Group wrote:
[...]
 (0007158) hvd (reporter) - 2025-04-28 19:30
 https://www.austingroupbugs.net/view.php?id=1920#c7158
----------------------------------------------------------------------
That wouldn't be enough to accurately specify what shells do even if limited to UTF-8. Since it's now the explicit intent that variables may contain bytes that do not form valid characters, we have to ask what happens when IFS contains
bytes that do not form valid characters.

In UTF-8, é is encoded as 0xC3 0xA9. 0xA9 on its own is not a valid character.
But IFS can be set to 0xA9. If IFS is set to 0xA9
[...]

[replying here as I apparently can no longer post on the issue tracker]

My understanding is that the behaviour is only defined if $IFS contains text, and would be undefined if it contains byte sequences that cannot be decoded into text like with your 0xA9 example.

So a compliant application (sh script) has to make sure $IFS is text encoded in the locale's charmap.

Same applies to the delimiter passed to -d: must be single-byte (because that's all that's supported by current implementations) and that byte must be the encoding of a character per the user's charmap (so in UTF-8 locales, restricted to bytes 0 to 127). In UTF-8 locales (and I would think all self-synchronizing encodings), that makes sure characters are not cut in the middle, but not in all multibyte encodings in practice.

I'm fine with that. Trying to split on 0xA9 bytes in a UTF-8 locale makes little sense, you'd want to switch to a locale with a single-byte encoding, typically the C locale which is the one I'd tend to use to work at byte level.

--
Stephane

  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
    • Re: (... Stephane Chazelas via austin-group-l at The Open Group
    • Re: [... Hans Åberg via austin-group-l at The Open Group

Reply via email to