Re: [PATCH] doc: dd: document the behavior of conv flags on multibyte characters

Pádraig Brady Sat, 13 Dec 2025 03:24:45 -0800

On 13/12/2025 07:15, Collin Funk wrote:

* doc/coreutils.texi (dd invocation): Document the behavior of 'dd' on
multibyte characters and some unspecified behavior that will be
documented in a future POSIX release [1].


[1] https://austingroupbugs.net/view.php?id=1959
---
  doc/coreutils.texi | 11 +++++++++++
  1 file changed, 11 insertions(+)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index d37cf2471..8ae81e110 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -9280,6 +9280,17 @@ @node dd invocation

The @samp{lcase} and @samp{ucase} conversions are mutually exclusive.+@c https://austingroupbugs.net/view.php?id=1959

+POSIX leaves the behavior of @samp{lcase} and @samp{ucase} unspecified
+on multibyte characters.  GNU @command{dd} only converts one byte at a
+time, because multibyte characters may cross block boundaries and case
+conversion may change the length of characters.
+
+POSIX also leaves the behavior of @samp{lcase} and @samp{ucase}
+unspecified if used with @samp{ascii}, @samp{ebcdic}, or @samp{ibm}.
+GNU @command{dd} will perform the case conversion and then perform the
+character set conversion.
+
  @item sparse
  @opindex sparse
  Try to seek rather than write NUL output blocks.


Thanks for following up with the POSIX folks.
This clarification looks good and is worth making.
the dd interface never considered multi-byte locales,
so is best restricted to uni-byte IMHO.

cheers,
Padraig

Re: [PATCH] doc: dd: document the behavior of conv flags on multibyte characters

Reply via email to