On 5/23/07, Jim Meyering <[EMAIL PROTECTED]> wrote:
"James Youngman" <[EMAIL PROTECTED]> wrote:
> On 5/22/07, Paul Eggert <[EMAIL PROTECTED]> wrote:
>> Hmmm, why does "wc" output _any_ such messages?  The standard doesn't
>> require them and they seem like a waste of time.  An invalid byte
>> should not count as part of a character or a word, but should count as
>> a byte and as part of a line.
>
> I think I agree, but I was trying to be a little cautious.  Unless I
> hear an objection in the next day or so, I'll rework the patch on the
> lines you suggest.

I agree.  Thanks for working on that.

Here is the (rather trivial) patch.    The ChangeLog entry is provided
inline, but it is also present at the head of the patch, which appears
as an attachment.

2007-05-25  James Youngman  <[EMAIL PROTECTED]>

       * src/wc.c (wc): Don't issue an error message when mbrtowc
         indicates that we have seen an invalid byte sequence.  This
         makes "wc /bin/sh" bearable (though the word and line counts
         are likely not to be useful).

James.
2007-05-25  James Youngman  <[EMAIL PROTECTED]>

	* src/wc.c (wc): Don't issue an error message when mbrtowc
          indicates that we have seen an invalid byte sequence.  This
          makes "wc /bin/sh" bearable (though the word and line counts
          are likely not to be useful).

Index: NEWS
===================================================================
RCS file: /sources/coreutils/coreutils/NEWS,v
retrieving revision 1.496
diff -u -p -r1.496 NEWS
--- NEWS	22 May 2007 18:59:06 -0000	1.496
+++ NEWS	25 May 2007 13:00:00 -0000
@@ -10,6 +10,11 @@ GNU coreutils NEWS                      
   option of the same name, this makes uniq consume and produce
   NUL-terminated lines rather than newline-terminated lines.
 
+  In multibyte locales, wc no longer produces character decoding error
+  messages.  This means for example that "wc /bin/sh" produces normal
+  output (though the word count will have no real meaning) rather than
+  many error messages.
+
 ** Bug fixes
 
   cut now diagnoses a range starting with zero (e.g., -f 0-2) as invalid;
Index: src/wc.c
===================================================================
RCS file: /sources/coreutils/coreutils/src/wc.c,v
retrieving revision 1.114
diff -u -p -r1.114 wc.c
--- src/wc.c	28 Mar 2007 06:57:40 -0000	1.114
+++ src/wc.c	25 May 2007 13:00:00 -0000
@@ -274,7 +274,6 @@ wc (int fd, char const *file_x, struct f
       bool in_word = false;
       uintmax_t linepos = 0;
       mbstate_t state = { 0, };
-      uintmax_t last_error_line = 0;
       int last_error_errno = 0;
 # if SUPPORT_OLD_MBRTOWC
       /* Back-up the state before each multibyte character conversion and
@@ -323,17 +322,11 @@ wc (int fd, char const *file_x, struct f
 		}
 	      if (n == (size_t) -1)
 		{
-		  /* Signal repeated errors only once per line.  */
-		  if (!(lines + 1 == last_error_line
-			&& errno == last_error_errno))
-		    {
-		      char line_number_buf[INT_BUFSIZE_BOUND (uintmax_t)];
-		      last_error_line = lines + 1;
-		      last_error_errno = errno;
-		      error (0, errno, "%s:%s", file,
-			     umaxtostr (last_error_line, line_number_buf));
-		      ok = false;
-		    }
+		  /* Remember that we read a byte, but don't complain
+		   * about the error.  Because of the decoding error,
+		   * this is a considered to be byte but not a
+		   * character (that is, chars is not incremented).
+		   */
 		  p++;
 		  bytes_read--;
 		}
_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Reply via email to