Re: [PATCH] doc: dd: document the behavior of conv flags on multibyte characters

Pádraig Brady Sat, 13 Dec 2025 16:27:12 -0800

On 13/12/2025 22:40, Grisha Levit wrote:

On Sat, Dec 13, 2025, 02:16 Collin Funk <[email protected]> wrote:


+@c https://austingroupbugs.net/view.php?id=1959
+POSIX leaves the behavior of @samp{lcase} and @samp{ucase} unspecified
+on multibyte characters.  GNU @command{dd} only converts one byte at a
+time,


I wonder if it may be ambiguous if "converts one byte at a time" means
"reads one byte and converts it" or "reads one byte and converts it to
one byte".  This seems to leave open the possibility that the "i" will
be converted in something like:

     $ LC_ALL=tr_TR.utf8 dd conv=ucase <<< hij
     HiJ

because multibyte characters may cross block boundaries and case
+conversion may change the length of characters.


But, OTOH, the meaning might be obvious enough from the rationale.


It's a bit of a stretch to think it may output multibyte,
but I do see the ambiguity.  Perhaps this is better:

+POSIX leaves the behavior of @samp{lcase} and @samp{ucase} unspecified
+on multibyte characters.  GNU @command{dd} supports only unibyte conversion,
+because multibyte characters may cross block boundaries and case
+conversion may change the length of characters.

This might better allude to the fact we fully support
conversion in any unibyte locale, as seen in:

  $ printf '%q\n' $(printf 'hi' | LC_ALL=tr_TR dd status=none conv=ucase)
   $'H\335'

cheers,
Padraig

Re: [PATCH] doc: dd: document the behavior of conv flags on multibyte characters

Reply via email to