I'm committing this.  The first hunk is a bit obvious, by the fact that Posix 
only requires sed to handle text files (the Posix definition of a text file is 
no NUL bytes, no lines longer than LINE_MAX, and either empty or ending in a 
trailing newline), but the reminder can't hurt.  The second is an actual Posix 
compliance bug encountered at least by people who insist on using ancient 
tools, although I don't yet know if other modern platforms besides Solaris 
share this tr bug.

From: Eric Blake <[EMAIL PROTECTED]>
Date: Thu, 2 Oct 2008 09:02:37 -0600
Subject: [PATCH] Document more binary file portability traps.

* doc/autoconf.texi (Limitations of Usual Tools) <sed>: Remind
reader that NUL and sed don't always mix.
<tr>: Mention Solaris /usr/ucb/tr bug with \0.

Signed-off-by: Eric Blake <[EMAIL PROTECTED]>
---
 ChangeLog         |    7 +++++++
 doc/autoconf.texi |   17 ++++++++++++++++-
 2 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 6b3f95b..dbba2f8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2008-10-02  Eric Blake  <[EMAIL PROTECTED]>
+
+       Document more binary file portability traps.
+       * doc/autoconf.texi (Limitations of Usual Tools) <sed>: Remind
+       reader that NUL and sed don't always mix.
+       <tr>: Mention Solaris /usr/ucb/tr bug with \0.
+
 2008-10-02  Ralf Wildenhues  <[EMAIL PROTECTED]>
 
        Implement parallel Autotest test execution: testsuite --jobs.
diff --git a/doc/autoconf.texi b/doc/autoconf.texi
index e515a87..939eaea 100644
--- a/doc/autoconf.texi
+++ b/doc/autoconf.texi
@@ -15849,7 +15849,9 @@ Limitations of Usual Tools
 @end example
 
 Input should not have unreasonably long lines, since some @command{sed}
-implementations have an input buffer limited to 4000 bytes.
+implementations have an input buffer limited to 4000 bytes.  Likewise,
+not all @command{sed} implementations can handle embedded @code{NUL} or
+a missing trailing newline.
 
 Portable @command{sed} regular expressions should use @samp{\} only to escape
 characters in the string @samp{$()[EMAIL PROTECTED]@}}.  For example,
@@ -16101,6 +16103,19 @@ Limitations of Usual Tools
 moonlight
 @end example
 
+Posix requires @command{tr} to operate on binary files.  But at least
+Solaris @command{/usr/ucb/tr} still fails to handle @samp{\0} as the
+octal escape for @code{NUL}.  On Solaris, when using @command{tr} to
+neutralize a binary file by converting @code{NUL} to a different
+character, it is necessary to use @command{/usr/xpg4/bin/tr} instead.
+
[EMAIL PROTECTED]
+$ @kbd{printf 'a\0b\n' | /usr/ucb/tr '\0' '~' | wc -c}
+3
+$ @kbd{printf 'a\0b\n' | /usr/xpg4/bin/tr '\0' '~' | wc -c}
+4
[EMAIL PROTECTED] example
+
 @end table
 
 
-- 
1.6.0.2





Reply via email to