A NOTE has been added to this issue. ====================================================================== https://www.austingroupbugs.net/view.php?id=1924 ====================================================================== Reported By: stephane Assigned To: ====================================================================== Project: 1003.1(2024)/Issue8 Issue ID: 1924 Category: Shell and Utilities Tags: tc1-2024 Type: Error Severity: Objection Priority: normal Status: Resolved Name: Stephane Chazelas Organization: User Reference: Section: Shell word splitting and "read" utility Page Number: various Line Number: various Interp Status: --- Final Accepted Text: https://www.austingroupbugs.net/view.php?id=1924#c7183 Resolution: Accepted As Marked Fixed in Version: ====================================================================== Date Submitted: 2025-05-05 19:02 UTC Last Modified: 2025-05-16 09:39 UTC ====================================================================== Summary: New word splitting requirements inappropriate in locales with non-self-synchronising character encodings ======================================================================
---------------------------------------------------------------------- (0007188) hvd (reporter) - 2025-05-16 09:39 https://www.austingroupbugs.net/view.php?id=1924#c7188 ---------------------------------------------------------------------- I'm not sure what the standardese would be, but I think it's possible to make it less unspecified so that it still allows handling file names containing arbitrary bytes, but restore the handling of all locales to what Issue 7 required. The rule that, as far as I know, all shells that support multibyte characters try to implement, is simple: When a shell interprets a byte string as a character string, this is done as if by repeated calls to mbrtowc(), except that if it would encounter EILSEQ, an unspecified character (other than a null character) is produced and conversion resumes from the initial conversion state. Are there any shells that do not try to follow to this general principle? If not, if there is a way to phrase that in a manner appropriate for standardization, the changes to require splitting on byte sequences can be reverted, the intended aim of those changes would then be handled transparently. Issue History Date Modified Username Field Change ====================================================================== 2025-05-05 19:02 stephane New Issue 2025-05-15 15:14 geoffclare Note Added: 0007183 2025-05-15 15:16 geoffclare Status New => Resolved 2025-05-15 15:16 geoffclare Resolution Open => Accepted As Marked 2025-05-15 15:16 geoffclare Interp Status => --- 2025-05-15 15:16 geoffclare Final Accepted Text => https://www.austingroupbugs.net/view.php?id=1924#c7183 2025-05-15 15:16 geoffclare Tag Attached: tc1-2024 2025-05-16 06:25 stephane Note Added: 0007186 2025-05-16 06:28 stephane Note Added: 0007187 2025-05-16 09:39 hvd Note Added: 0007188 ======================================================================