wc is a essentially a digesting function like sha etc. in that it produces a 1 line summary per file. The attached patch ensures that those lines are output atomically for concurrent wc processes.
Note in general one can use `stdbuf -oL cmd` to line-buffer a process which outputs to stdout, but I think this should be done internally in this case. Note also that for commands which don't use stdio for output, one could use sed to to the line buffering `cmd | sed -n p`. cheers, Pádraig.
>From fbcde9907b3cea7632e8186230adf811f8d9c01f Mon Sep 17 00:00:00 2001 From: =?utf-8?q?P=C3=A1draig=20Brady?= <[email protected]> Date: Tue, 22 Dec 2009 07:36:12 +0000 Subject: [PATCH] wc: line-buffer the printed counts * src/wc.c (main): Set stdout to line buffered mode to ensure parallel running instances don't intersperse their output. This adds 6.5% to the run time in the worst case of many zero length files, but has neglible impact for standard sized files. * tests/misc/wc-parallel: New test for atomic output. * tests/Makefile.am: Reference it. * NEWS: Mention the fix This is similar to commit 710fe413, 20-10-2009, "md5sum, sha*sum, sum: line-buffer the printed checksums" --- NEWS | 4 ++++ src/wc.c | 4 ++++ tests/Makefile.am | 1 + tests/misc/wc-parallel | 37 +++++++++++++++++++++++++++++++++++++ 4 files changed, 46 insertions(+), 0 deletions(-) create mode 100755 tests/misc/wc-parallel diff --git a/NEWS b/NEWS index e0287cc..496f1ea 100644 --- a/NEWS +++ b/NEWS @@ -13,6 +13,10 @@ GNU coreutils NEWS -*- outline -*- adjusted, working around a bug in current Linux kernels. [bug introduced in coreutils-8.1] + wc now prints counts atomically so that concurrent + processes will not intersperse their output. + [the bug dates back to the initial implementation] + * Noteworthy changes in release 8.2 (2009-12-11) [stable] diff --git a/src/wc.c b/src/wc.c index 52e899e..48b5a4e 100644 --- a/src/wc.c +++ b/src/wc.c @@ -598,6 +598,10 @@ main (int argc, char **argv) atexit (close_stdout); + /* Line buffer stdout to ensure lines are written atomically and immediately + so that processes running in parallel do not intersperse their output. */ + setvbuf (stdout, NULL, _IOLBF, 0); + print_lines = print_words = print_chars = print_bytes = false; print_linelength = false; total_lines = total_words = total_chars = total_bytes = max_line_length = 0; diff --git a/tests/Makefile.am b/tests/Makefile.am index 5e44202..93d4275 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -157,6 +157,7 @@ TESTS = \ misc/wc \ misc/wc-files0-from \ misc/wc-files0 \ + misc/wc-parallel \ misc/cat-proc \ misc/cat-buf \ misc/base64 \ diff --git a/tests/misc/wc-parallel b/tests/misc/wc-parallel new file mode 100755 index 0000000..0d81d4a --- /dev/null +++ b/tests/misc/wc-parallel @@ -0,0 +1,37 @@ +#!/bin/sh +# Ensure that wc prints counts atomically +# so that concurrent processes don't intersperse their output + +# Copyright (C) 2009 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +. $srcdir/test-lib.sh + +if test "$VERBOSE" = yes; then + set -x + md5sum --version +fi + + +(mkdir tmp && cd tmp && seq 2000 | xargs touch) + +# This will output at least 16KiB per process +# and start 3 processes, with 2 running concurrently, +# which triggers often on Fedora 11 at least. +(find tmp tmp tmp -type f | xargs -n500 -P2 wc) | +sed -n '/0 0 0 /!p' | +grep . > /dev/null && fail=1 + +Exit $fail -- 1.6.2.5
