On 21/03/16 15:16, Pádraig Brady wrote: > On 21/03/16 00:59, William R. Fraser wrote: >> When wc gets its list of files by reading from stdin, using the argument >> '--from-files0=-', it reuses the same fstatus struct for each file. >> >> The problem is that the 'wc' function checks the 'failed' member of this >> struct and if it is <=0, it skips doing fstat on the file. The main loop >> doesn't reset this value between files, so only the first file has fstat >> done on it. >> >> This can result in the 'wc' function seeking past the end of >> subsequent files and then over-reporting their byte counts. >> >> See the attached patch, which resets the fstatus struct in between files >> when reading the file list from stdin. > > Ouch. This seems to be since v7.0-96-gc2e56e0 > It would also mean there would be a lot of redundant reading > if the initial file was significantly smaller than any other file. > > $ truncate -s1G wc.big > $ touch wc.small > $ printf '%s\0' wc.big wc.small | wc -c --files0-from=- > 1073741824 wc.big > 1073741760 wc.small > 2147483584 total
Sorry for the delay. I didn't go far enough back in my TODO list so missed this. Proposed patch attached. thanks, Pádraig
>From 31bba96cb7dc565719c396b5973085e644939fd5 Mon Sep 17 00:00:00 2001 From: "William R. Fraser" <wfra...@codewise.org> Date: Sun, 20 Mar 2016 17:44:09 -0700 Subject: [PATCH] wc: fix wrong byte counts when using --files-from0 * src/wc.c (main): Reset fstatus[0].failed between files when reusing the fstatus[0] entry in --files-from0 mode. This ensures a stat() is done for each file, avoid incorrect counts and redundant reading. * NEWS: Mention the bug fix. * tests/misc/wc-files0.sh: Add a test case. Fixes http://bugs.gnu.org/23073 --- NEWS | 5 +++++ src/wc.c | 3 +++ tests/misc/wc-files0.sh | 13 ++++++++++++- 3 files changed, 20 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index 179c19b..1ed5bd9 100644 --- a/NEWS +++ b/NEWS @@ -8,6 +8,11 @@ GNU coreutils NEWS -*- outline -*- 158909489063877810457 and 222087527029934481871. [bug introduced in coreutils-8.20] + wc --bytes --files0-from now correctly reports byte counts. + Previously it may have returned values that were too large, + depending on the size of the first file processed. + [bug introduced in coreutils-7.1] + * Noteworthy changes in release 8.26 (2016-11-30) [stable] diff --git a/src/wc.c b/src/wc.c index 412bda0..64df50c 100644 --- a/src/wc.c +++ b/src/wc.c @@ -807,6 +807,9 @@ main (int argc, char **argv) ok = false; else ok &= wc_file (file_name, &fstatus[nfiles ? i : 0]); + + if (! nfiles) + fstatus[0].failed = 1; } argv_iter_done: diff --git a/tests/misc/wc-files0.sh b/tests/misc/wc-files0.sh index 12b7d6a..d92a010 100755 --- a/tests/misc/wc-files0.sh +++ b/tests/misc/wc-files0.sh @@ -25,7 +25,7 @@ printf '2b\n2w\n' |tr '\n' '\0' > names || framework_failure_ wc --files0-from=names > out || fail=1 -cat <<\EOF > exp || fail=1 +cat <<\EOF > exp || framework_failure_ 1 1 2 2b 1 2 8 2w 2 3 10 total @@ -48,4 +48,15 @@ printf '%s\0' "$nlname" | wc --files0-from=- > out || fail=1 printf '%s\n' "0 0 0 '1'$'\\n''2'" > exp || framework_failure_ compare exp out || fail=1 +# Ensure correct byte counts, which fails between v7.1 and v8.26 inclusive +truncate -s1G wc.big || framework_failure_ +touch wc.small || framework_failure_ +printf '%s\0' wc.big wc.small | wc -c --files0-from=- >out || fail=1 +cat <<\EOF > exp || framework_failure_ +1073741824 wc.big +0 wc.small +1073741824 total +EOF +compare exp out || fail=1 + Exit $fail -- 2.5.5