A NOTE has been added to this issue. ====================================================================== https://www.austingroupbugs.net/view.php?id=1959 ====================================================================== Reported By: collinfunk Assigned To: ====================================================================== Project: 1003.1(2024)/Issue8 Issue ID: 1959 Category: Shell and Utilities Tags: tc1-2024 Type: Clarification Requested Severity: Editorial Priority: normal Status: Interpretation Required Name: Organization: GNU User Reference: Section: XCU dd Page Number: 2778 Line Number: 91990 - 91996 Interp Status: Proposed Final Accepted Text: https://www.austingroupbugs.net/view.php?id=1959#c7335 Resolution: Accepted As Marked Fixed in Version: ====================================================================== Date Submitted: 2025-11-13 23:13 UTC Last Modified: 2025-12-16 07:39 UTC ====================================================================== Summary: dd conv=lcase and conv=ucase should only translate single byte locales ======================================================================
---------------------------------------------------------------------- (0007341) stephane (reporter) - 2025-12-16 07:39 https://www.austingroupbugs.net/view.php?id=1959#c7341 ---------------------------------------------------------------------- > However, introducing case conversion means we we must read entire multibyte characters, even if they extend across a block. Also complicating factor is that case conversion may change the length of the character in Unicode. Take the following example: > > $ python3 -c 'print(len("ß"))' > 1 > $ python3 -c 'print(len("ß".upper()))' > 2 > > If we have an input block containing all ASCII characters and `ß` as the last character, using `conv=ucase,sync bs=512` would result in a 512-byte output block followed a second block contains the second byte of uppercase `ß` and 511 NUL bytes. This is probably not what someone expects when using `dd`. POSIX case conversion is from character to character, it cannot translate "ß" to "SS" as per Unicode (or like perl/python do). It can however translate between characters with an encoding of different size, including ASCII ones such as "i" whose uppercase translation would be "İ" in some locales and that is encoded on 2 bytes in UTF-8. Issue History Date Modified Username Field Change ====================================================================== 2025-11-13 23:13 collinfunk New Issue 2025-12-11 17:17 geoffclare Note Added: 0007335 2025-12-11 17:18 geoffclare Status New => Interpretation Required 2025-12-11 17:18 geoffclare Resolution Open => Accepted As Marked 2025-12-11 17:18 geoffclare Name Your Name Here => 2025-12-11 17:18 geoffclare Interp Status => Pending 2025-12-11 17:18 geoffclare Final Accepted Text => https://www.austingroupbugs.net/view.php?id=1959#c7335 2025-12-11 17:19 geoffclare Tag Attached: tc1-2024 2025-12-15 06:55 ajosey Interp Status Pending => Proposed 2025-12-15 06:55 ajosey Note Added: 0007338 2025-12-16 07:16 stephane Note Added: 0007340 2025-12-16 07:39 stephane Note Added: 0007341 ======================================================================
