A NOTE has been added to this issue. 
====================================================================== 
https://www.austingroupbugs.net/view.php?id=1920 
====================================================================== 
Reported By:                stephane
Assigned To:                
====================================================================== 
Project:                    1003.1(2013)/Issue7+TC1
Issue ID:                   1920
Category:                   Shell and Utilities
Type:                       Omission
Severity:                   Objection
Priority:                   normal
Status:                     New
Name:                       Stephane Chazelas 
Organization:                
User Reference:              
Section:                    read utility, stdin section 
Page Number:                3321 
Line Number:                112915 
Interp Status:              --- 
Final Accepted Text:         
====================================================================== 
Date Submitted:             2025-04-21 07:16 UTC
Last Modified:              2025-04-27 08:51 UTC
====================================================================== 
Summary:                    read -d '' on invalid text without -r and IFS=
====================================================================== 

---------------------------------------------------------------------- 
 (0007155) stephane (reporter) - 2025-04-27 08:51
 https://www.austingroupbugs.net/view.php?id=1920#c7155 
---------------------------------------------------------------------- 
In view of https://www.austingroupbugs.net/view.php?id=1561 (about yash
behaviour now being proscribed) and the changes wrt word splitting introduced by
resolutions of https://www.austingroupbugs.net/view.php?id=1560 and
https://www.austingroupbugs.net/view.php?id=1649 (which AFAICT no shell has
implemented yet), I'll agree that most of the points I raised in this issue are
void.

The only remaining one is the handling of backslash by the read utility in the
absence of the -r option.

We'd need to have either:
1. stdin shall be text (not required to end in newline, no LINE_MAX limit) if -r
is not specified
2. or in the same vein as the recent changes to IFS-splitting, change the
wording so it's not the backslash character that is considered as the escaper,
but the byte encoding of the backslash character, whether it's found in the
encoding of backslash or that of any other character or of no character.

2 would however mean that backslash processing in "read" would be done
differently from anywhere else, and raises additional questions if IFS contains
characters whose encoding contains that of backslash.

More generally, while I welcome the changes to word splitting that make it
possible to handle arbitrary strings of non-null bytes in locales that use
single-byte encodings or UTF-8 or other multi-byte encoding that don't have
characters whose encoding is found inside the encoding of other characters, for
locales that use multi-byte encodings such as BIG5-HKSCS or GB18030, those
changes are really counterproductive and *require* shells to implement a total
mess inconsistent with the rest of the system.

So it sounds to me like this current
(https://www.austingroupbugs.net/view.php?id=1920) issue should be withdrawn and
another issue raised about the more general problem of locales where characters
can contain the encoding of other characters.

And it seems to me that the only sensible resolution to that one would be that
those character encodings such as BIG5-HKSCS or GB18030 that have characters
whose encoding contains the encoding of other characters (including from the
portable charset including backslash in the case of those two) should be left
out of scope of POSIX, so multi-byte aware shells such as bash/ksh93/zsh can
carry on doing the more sensible thing they're doing just now and don't have to
implement those changes from https://www.austingroupbugs.net/view.php?id=1560
other than making sure UTF-8 decoding errors (for those that decode before
splitting and doing backslash processing) don't prevent splitting strings (or
process backslashes) safely.

Once those character encodings are out of the picture, it should also be
possible to simplify the standard. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2025-04-21 07:16 stephane       New Issue                                    
2025-04-21 07:30 stephane       Note Added: 0007139                          
2025-04-21 07:38 stephane       Note Edited: 0007139                         
2025-04-22 14:46 geoffclare     Note Added: 0007140                          
2025-04-22 15:20 hvd            Note Added: 0007141                          
2025-04-22 18:45 chet_ramey     Note Added: 0007142                          
2025-04-23 14:59 dwheeler       Note Added: 0007143                          
2025-04-23 16:47 stephane       Note Added: 0007144                          
2025-04-23 16:57 stephane       Note Added: 0007145                          
2025-04-23 19:53 dwheeler       Note Added: 0007146                          
2025-04-24 10:54 geoffclare     Note Added: 0007147                          
2025-04-24 10:55 geoffclare     Note Edited: 0007147                         
2025-04-24 11:00 geoffclare     Note Added: 0007148                          
2025-04-24 11:23 stephane       Note Added: 0007149                          
2025-04-24 12:18 stephane       Note Added: 0007150                          
2025-04-24 15:43 dwheeler       Note Added: 0007152                          
2025-04-27 08:51 stephane       Note Added: 0007155                          
======================================================================


  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
    • Re: (... Stephane Chazelas via austin-group-l at The Open Group
    • Re: [... Hans Ã…berg via austin-group-l at The Open Group

Reply via email to