On Thu, Apr 15, 2021 at 04:29:17PM +0200, Christian Weisgerber wrote:

> Jordan Geoghegan:
> 
> > --- /tmp/bad.txt  Wed Apr 14 21:06:51 2021
> > +++ /tmp/good.txt  Wed Apr 14 21:06:41 2021
> 
> I'll note that no characters have been lost between the two files.
> Only the order is different.
> 
> > The only thing that changed between these runs was me using either xargs -P 
> > 1 or -P 2.
> 
> What do you expect?  You run two processes in parallel that write
> to the same file.  Obviously their output will be interspersed in
> unpredictable order.
> 
> You seem to imagine that awk's output is line-buffered.  But when
> it writes to a pipe or file, its output is block-buffered.  This
> is default stdio behavior.  Output is written in block-size increments
> (16 kB in practice) without regard to lines.  So, yes, you can end
> up with a fragment from a line written by process #1, followed by
> lines from process #2, followed by the remainder of the line from
> #1, etc.
> 
> -- 
> Christian "naddy" Weisgerber                          na...@mips.inka.de
> 

Right, a fflush() call after the printf makes the issue go away, but
only since awk is being nice and issues a single write call for that
single printf. Since awk afaik does not give such a guarantee, it is
better to have each parallel invocation write to a separate file and
then cat them together after all the awk runs are done.

        -Otto

Reply via email to