I'm committing this. The first hunk is a bit obvious, by the fact that Posix
only requires sed to handle text files (the Posix definition of a text file is
no NUL bytes, no lines longer than LINE_MAX, and either empty or ending in a
trailing newline), but the reminder can't hurt. The second is an actual Posix
compliance bug encountered at least by people who insist on using ancient
tools, although I don't yet know if other modern platforms besides Solaris
share this tr bug.
From: Eric Blake <[EMAIL PROTECTED]>
Date: Thu, 2 Oct 2008 09:02:37 -0600
Subject: [PATCH] Document more binary file portability traps.
* doc/autoconf.texi (Limitations of Usual Tools) <sed>: Remind
reader that NUL and sed don't always mix.
<tr>: Mention Solaris /usr/ucb/tr bug with \0.
Signed-off-by: Eric Blake <[EMAIL PROTECTED]>
---
ChangeLog | 7 +++++++
doc/autoconf.texi | 17 ++++++++++++++++-
2 files changed, 23 insertions(+), 1 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 6b3f95b..dbba2f8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2008-10-02 Eric Blake <[EMAIL PROTECTED]>
+
+ Document more binary file portability traps.
+ * doc/autoconf.texi (Limitations of Usual Tools) <sed>: Remind
+ reader that NUL and sed don't always mix.
+ <tr>: Mention Solaris /usr/ucb/tr bug with \0.
+
2008-10-02 Ralf Wildenhues <[EMAIL PROTECTED]>
Implement parallel Autotest test execution: testsuite --jobs.
diff --git a/doc/autoconf.texi b/doc/autoconf.texi
index e515a87..939eaea 100644
--- a/doc/autoconf.texi
+++ b/doc/autoconf.texi
@@ -15849,7 +15849,9 @@ Limitations of Usual Tools
@end example
Input should not have unreasonably long lines, since some @command{sed}
-implementations have an input buffer limited to 4000 bytes.
+implementations have an input buffer limited to 4000 bytes. Likewise,
+not all @command{sed} implementations can handle embedded @code{NUL} or
+a missing trailing newline.
Portable @command{sed} regular expressions should use @samp{\} only to escape
characters in the string @samp{$()[EMAIL PROTECTED]@}}. For example,
@@ -16101,6 +16103,19 @@ Limitations of Usual Tools
moonlight
@end example
+Posix requires @command{tr} to operate on binary files. But at least
+Solaris @command{/usr/ucb/tr} still fails to handle @samp{\0} as the
+octal escape for @code{NUL}. On Solaris, when using @command{tr} to
+neutralize a binary file by converting @code{NUL} to a different
+character, it is necessary to use @command{/usr/xpg4/bin/tr} instead.
+
[EMAIL PROTECTED]
+$ @kbd{printf 'a\0b\n' | /usr/ucb/tr '\0' '~' | wc -c}
+3
+$ @kbd{printf 'a\0b\n' | /usr/xpg4/bin/tr '\0' '~' | wc -c}
+4
[EMAIL PROTECTED] example
+
@end table
--
1.6.0.2