On Thu, Apr 15, 2021 at 04:29:17PM +0200, Christian Weisgerber wrote:
> Jordan Geoghegan:
>
> > --- /tmp/bad.txt Wed Apr 14 21:06:51 2021
> > +++ /tmp/good.txt Wed Apr 14 21:06:41 2021
>
> I'll note that no characters have been lost between the two files.
> Only the order is different.
>
> > The only thing that changed between these runs was me using either xargs -P
> > 1 or -P 2.
>
> What do you expect? You run two processes in parallel that write
> to the same file. Obviously their output will be interspersed in
> unpredictable order.
>
> You seem to imagine that awk's output is line-buffered. But when
> it writes to a pipe or file, its output is block-buffered. This
> is default stdio behavior. Output is written in block-size increments
> (16 kB in practice) without regard to lines. So, yes, you can end
> up with a fragment from a line written by process #1, followed by
> lines from process #2, followed by the remainder of the line from
> #1, etc.
>
> --
> Christian "naddy" Weisgerber [email protected]
>
Right, a fflush() call after the printf makes the issue go away, but
only since awk is being nice and issues a single write call for that
single printf. Since awk afaik does not give such a guarantee, it is
better to have each parallel invocation write to a separate file and
then cat them together after all the awk runs are done.
-Otto