A NOTE has been added to this issue. 
====================================================================== 
https://www.austingroupbugs.net/view.php?id=1959 
====================================================================== 
Reported By:                collinfunk
Assigned To:                
====================================================================== 
Project:                    1003.1(2024)/Issue8
Issue ID:                   1959
Category:                   Shell and Utilities
Tags:                       tc1-2024
Type:                       Clarification Requested
Severity:                   Editorial
Priority:                   normal
Status:                     Interpretation Required
Name:                        
Organization:               GNU 
User Reference:              
Section:                    XCU dd 
Page Number:                2778 
Line Number:                91990 - 91996 
Interp Status:              Proposed 
Final Accepted Text:       
https://www.austingroupbugs.net/view.php?id=1959#c7335 
Resolution:                 Accepted As Marked
Fixed in Version:           
====================================================================== 
Date Submitted:             2025-11-13 23:13 UTC
Last Modified:              2025-12-16 07:39 UTC
====================================================================== 
Summary:                    dd conv=lcase and conv=ucase should only translate
single byte locales
====================================================================== 

---------------------------------------------------------------------- 
 (0007341) stephane (reporter) - 2025-12-16 07:39
 https://www.austingroupbugs.net/view.php?id=1959#c7341 
---------------------------------------------------------------------- 
> However, introducing case conversion means we we must read entire multibyte
characters, even if they extend across a block. Also complicating factor is that
case conversion may change the length of the character in Unicode. Take the
following example:
>
>    $ python3 -c 'print(len("ß"))'
>    1
>    $ python3 -c 'print(len("ß".upper()))'
>    2
> 
> If we have an input block containing all ASCII characters and `ß` as the last
character, using `conv=ucase,sync bs=512` would result in a 512-byte output
block followed a second block contains the second byte of uppercase `ß` and 511
NUL bytes. This is probably not what someone expects when using `dd`.

POSIX case conversion is from character to character, it cannot translate "ß"
to "SS" as per Unicode (or like perl/python do). It can however translate
between characters with an encoding of different size, including ASCII ones such
as "i" whose uppercase translation would be "İ" in some locales and that is
encoded on 2 bytes in UTF-8. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2025-11-13 23:13 collinfunk     New Issue                                    
2025-12-11 17:17 geoffclare     Note Added: 0007335                          
2025-12-11 17:18 geoffclare     Status                   New => Interpretation
Required
2025-12-11 17:18 geoffclare     Resolution               Open => Accepted As
Marked
2025-12-11 17:18 geoffclare     Name                     Your Name Here =>   
2025-12-11 17:18 geoffclare     Interp Status             => Pending         
2025-12-11 17:18 geoffclare     Final Accepted Text       =>
https://www.austingroupbugs.net/view.php?id=1959#c7335    
2025-12-11 17:19 geoffclare     Tag Attached: tc1-2024                       
2025-12-15 06:55 ajosey         Interp Status            Pending => Proposed 
2025-12-15 06:55 ajosey         Note Added: 0007338                          
2025-12-16 07:16 stephane       Note Added: 0007340                          
2025-12-16 07:39 stephane       Note Added: 0007341                          
======================================================================


  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group

Reply via email to