On 21/03/16 15:16, Pádraig Brady wrote:
> On 21/03/16 00:59, William R. Fraser wrote:
>> When wc gets its list of files by reading from stdin, using the argument
>> '--from-files0=-', it reuses the same fstatus struct for each file.
>>
>> The problem is that the 'wc' function checks the 'failed' member of this
>> struct and if it is <=0, it skips doing fstat on the file. The main loop
>> doesn't reset this value between files, so only the first file has fstat
>> done on it.
>>
>> This can result in the 'wc' function seeking past the end of
>> subsequent files and then over-reporting their byte counts.
>>
>> See the attached patch, which resets the fstatus struct in between files
>> when reading the file list from stdin.
> 
> Ouch. This seems to be since v7.0-96-gc2e56e0
> It would also mean there would be a lot of redundant reading
> if the initial file was significantly smaller than any other file.
> 
> $ truncate -s1G wc.big
> $ touch wc.small
> $ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
> 1073741824 wc.big
> 1073741760 wc.small
> 2147483584 total

Sorry for the delay.
I didn't go far enough back in my TODO list so missed this.
Proposed patch attached.

thanks,
Pádraig

>From 31bba96cb7dc565719c396b5973085e644939fd5 Mon Sep 17 00:00:00 2001
From: "William R. Fraser" <wfra...@codewise.org>
Date: Sun, 20 Mar 2016 17:44:09 -0700
Subject: [PATCH] wc: fix wrong byte counts when using --files-from0

* src/wc.c (main): Reset fstatus[0].failed between files when reusing
the fstatus[0] entry in --files-from0 mode.  This ensures a stat() is
done for each file, avoid incorrect counts and redundant reading.
* NEWS: Mention the bug fix.
* tests/misc/wc-files0.sh: Add a test case.
Fixes http://bugs.gnu.org/23073
---
 NEWS                    |  5 +++++
 src/wc.c                |  3 +++
 tests/misc/wc-files0.sh | 13 ++++++++++++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 179c19b..1ed5bd9 100644
--- a/NEWS
+++ b/NEWS
@@ -8,6 +8,11 @@ GNU coreutils NEWS                                    -*- outline -*-
   158909489063877810457 and 222087527029934481871.
   [bug introduced in coreutils-8.20]
 
+  wc --bytes --files0-from now correctly reports byte counts.
+  Previously it may have returned values that were too large,
+  depending on the size of the first file processed.
+  [bug introduced in coreutils-7.1]
+
 
 * Noteworthy changes in release 8.26 (2016-11-30) [stable]
 
diff --git a/src/wc.c b/src/wc.c
index 412bda0..64df50c 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -807,6 +807,9 @@ main (int argc, char **argv)
         ok = false;
       else
         ok &= wc_file (file_name, &fstatus[nfiles ? i : 0]);
+
+      if (! nfiles)
+        fstatus[0].failed = 1;
     }
  argv_iter_done:
 
diff --git a/tests/misc/wc-files0.sh b/tests/misc/wc-files0.sh
index 12b7d6a..d92a010 100755
--- a/tests/misc/wc-files0.sh
+++ b/tests/misc/wc-files0.sh
@@ -25,7 +25,7 @@ printf '2b\n2w\n' |tr '\n' '\0' > names || framework_failure_
 
 
 wc --files0-from=names > out || fail=1
-cat <<\EOF > exp || fail=1
+cat <<\EOF > exp || framework_failure_
  1  1  2 2b
  1  2  8 2w
  2  3 10 total
@@ -48,4 +48,15 @@ printf '%s\0' "$nlname" | wc --files0-from=- > out || fail=1
 printf '%s\n' "0 0 0 '1'$'\\n''2'" > exp || framework_failure_
 compare exp out || fail=1
 
+# Ensure correct byte counts, which fails between v7.1 and v8.26 inclusive
+truncate -s1G wc.big || framework_failure_
+touch wc.small || framework_failure_
+printf '%s\0' wc.big wc.small | wc -c --files0-from=- >out || fail=1
+cat <<\EOF > exp || framework_failure_
+1073741824 wc.big
+0 wc.small
+1073741824 total
+EOF
+compare exp out || fail=1
+
 Exit $fail
-- 
2.5.5

Reply via email to