A NOTE has been added to this issue. ====================================================================== https://www.austingroupbugs.net/view.php?id=1920 ====================================================================== Reported By: stephane Assigned To: ====================================================================== Project: 1003.1(2013)/Issue7+TC1 Issue ID: 1920 Category: Shell and Utilities Type: Omission Severity: Objection Priority: normal Status: New Name: Stephane Chazelas Organization: User Reference: Section: read utility, stdin section Page Number: 3321 Line Number: 112915 Interp Status: --- Final Accepted Text: ====================================================================== Date Submitted: 2025-04-21 07:16 UTC Last Modified: 2025-04-27 08:51 UTC ====================================================================== Summary: read -d '' on invalid text without -r and IFS= ======================================================================
---------------------------------------------------------------------- (0007155) stephane (reporter) - 2025-04-27 08:51 https://www.austingroupbugs.net/view.php?id=1920#c7155 ---------------------------------------------------------------------- In view of https://www.austingroupbugs.net/view.php?id=1561 (about yash behaviour now being proscribed) and the changes wrt word splitting introduced by resolutions of https://www.austingroupbugs.net/view.php?id=1560 and https://www.austingroupbugs.net/view.php?id=1649 (which AFAICT no shell has implemented yet), I'll agree that most of the points I raised in this issue are void. The only remaining one is the handling of backslash by the read utility in the absence of the -r option. We'd need to have either: 1. stdin shall be text (not required to end in newline, no LINE_MAX limit) if -r is not specified 2. or in the same vein as the recent changes to IFS-splitting, change the wording so it's not the backslash character that is considered as the escaper, but the byte encoding of the backslash character, whether it's found in the encoding of backslash or that of any other character or of no character. 2 would however mean that backslash processing in "read" would be done differently from anywhere else, and raises additional questions if IFS contains characters whose encoding contains that of backslash. More generally, while I welcome the changes to word splitting that make it possible to handle arbitrary strings of non-null bytes in locales that use single-byte encodings or UTF-8 or other multi-byte encoding that don't have characters whose encoding is found inside the encoding of other characters, for locales that use multi-byte encodings such as BIG5-HKSCS or GB18030, those changes are really counterproductive and *require* shells to implement a total mess inconsistent with the rest of the system. So it sounds to me like this current (https://www.austingroupbugs.net/view.php?id=1920) issue should be withdrawn and another issue raised about the more general problem of locales where characters can contain the encoding of other characters. And it seems to me that the only sensible resolution to that one would be that those character encodings such as BIG5-HKSCS or GB18030 that have characters whose encoding contains the encoding of other characters (including from the portable charset including backslash in the case of those two) should be left out of scope of POSIX, so multi-byte aware shells such as bash/ksh93/zsh can carry on doing the more sensible thing they're doing just now and don't have to implement those changes from https://www.austingroupbugs.net/view.php?id=1560 other than making sure UTF-8 decoding errors (for those that decode before splitting and doing backslash processing) don't prevent splitting strings (or process backslashes) safely. Once those character encodings are out of the picture, it should also be possible to simplify the standard. Issue History Date Modified Username Field Change ====================================================================== 2025-04-21 07:16 stephane New Issue 2025-04-21 07:30 stephane Note Added: 0007139 2025-04-21 07:38 stephane Note Edited: 0007139 2025-04-22 14:46 geoffclare Note Added: 0007140 2025-04-22 15:20 hvd Note Added: 0007141 2025-04-22 18:45 chet_ramey Note Added: 0007142 2025-04-23 14:59 dwheeler Note Added: 0007143 2025-04-23 16:47 stephane Note Added: 0007144 2025-04-23 16:57 stephane Note Added: 0007145 2025-04-23 19:53 dwheeler Note Added: 0007146 2025-04-24 10:54 geoffclare Note Added: 0007147 2025-04-24 10:55 geoffclare Note Edited: 0007147 2025-04-24 11:00 geoffclare Note Added: 0007148 2025-04-24 11:23 stephane Note Added: 0007149 2025-04-24 12:18 stephane Note Added: 0007150 2025-04-24 15:43 dwheeler Note Added: 0007152 2025-04-27 08:51 stephane Note Added: 0007155 ======================================================================