[1003.1(2013)/Issue7+TC1 0001920]: read -d '' on invalid text without -r and IFS=

Austin Group Issue Tracker via austin-group-l at The Open Group Mon, 28 Apr 2025 12:37:24 -0700

A NOTE has been added to this issue. 
====================================================================== 
https://www.austingroupbugs.net/view.php?id=1920 
====================================================================== 
Reported By:                stephane
Assigned To:                
====================================================================== 
Project:                    1003.1(2013)/Issue7+TC1
Issue ID:                   1920
Category:                   Shell and Utilities
Type:                       Omission
Severity:                   Objection
Priority:                   normal
Status:                     New
Name:                       Stephane Chazelas 
Organization:                
User Reference:              
Section:                    read utility, stdin section 
Page Number:                3321 
Line Number:                112915 
Interp Status:              --- 
Final Accepted Text:         
====================================================================== 
Date Submitted:             2025-04-21 07:16 UTC
Last Modified:              2025-04-28 19:30 UTC
====================================================================== 
Summary:                    read -d '' on invalid text without -r and IFS=
======================================================================


---------------------------------------------------------------------- 
 (0007158) hvd (reporter) - 2025-04-28 19:30
 https://www.austingroupbugs.net/view.php?id=1920#c7158 
---------------------------------------------------------------------- 
That wouldn't be enough to accurately specify what shells do even if limited to
UTF-8. Since it's now the explicit intent that variables may contain bytes that
do not form valid characters, we have to ask what happens when IFS contains
bytes that do not form valid characters.

In UTF-8, é is encoded as 0xC3 0xA9. 0xA9 on its own is not a valid character.
But IFS can be set to 0xA9. If IFS is set to 0xA9, and X is set to 0xC3 0xA9
0xA9 0x40 (é, invalid byte, @), then in most locale-aware shells that I know of
that permit arbitrary bytes in variables (bash, gwsh, bosh, ksh), $X is split
into two fields, the first one being 0xC3 0xA9, the second one being @. Most
shells do not do any pure byte-based splitting. Exceptions are mksh which does
appear to do exactly that (producing 0xC3, empty, 0x40), and zsh which does not
split at all on this case.

Clearly the current wording is defective. A long time ago I wrote on the mailing
list in more detail about what shells actually did with variables containing
bytes that do not form valid characters in the context of pattern matching
(subject: "[Issue 8 drafts 0001564]: clariy on what (character/byte) strings
pattern matching notation should work") and asked whether there was any interest
in getting this standardized. There was no interest then. Given the mess that we
have now ended up with, please now actually look at what shells do, and specify
that, rather than coming up with more broken specs that only handle the trivial
cases. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2025-04-21 07:16 stephane       New Issue                                    
2025-04-21 07:30 stephane       Note Added: 0007139                          
2025-04-21 07:38 stephane       Note Edited: 0007139                         
2025-04-22 14:46 geoffclare     Note Added: 0007140                          
2025-04-22 15:20 hvd            Note Added: 0007141                          
2025-04-22 18:45 chet_ramey     Note Added: 0007142                          
2025-04-23 14:59 dwheeler       Note Added: 0007143                          
2025-04-23 16:47 stephane       Note Added: 0007144                          
2025-04-23 16:57 stephane       Note Added: 0007145                          
2025-04-23 19:53 dwheeler       Note Added: 0007146                          
2025-04-24 10:54 geoffclare     Note Added: 0007147                          
2025-04-24 10:55 geoffclare     Note Edited: 0007147                         
2025-04-24 11:00 geoffclare     Note Added: 0007148                          
2025-04-24 11:23 stephane       Note Added: 0007149                          
2025-04-24 12:18 stephane       Note Added: 0007150                          
2025-04-24 15:43 dwheeler       Note Added: 0007152                          
2025-04-27 08:51 stephane       Note Added: 0007155                          
2025-04-28 18:37 stephane       Note Added: 0007156                          
2025-04-28 19:30 hvd            Note Added: 0007158                          
======================================================================

[1003.1(2013)/Issue7+TC1 0001920]: read -d '' on invalid text without -r and IFS=

Reply via email to