On 4/15/21 7:49 AM, Otto Moerbeek wrote:
> On Thu, Apr 15, 2021 at 04:29:17PM +0200, Christian Weisgerber wrote:
>
>> Jordan Geoghegan:
>>
>>> --- /tmp/bad.txt  Wed Apr 14 21:06:51 2021
>>> +++ /tmp/good.txt  Wed Apr 14 21:06:41 2021
>> I'll note that no characters have been lost between the two files.
>> Only the order is different.
>>
>>> The only thing that changed between these runs was me using either xargs -P 
>>> 1 or -P 2.
>> What do you expect?  You run two processes in parallel that write
>> to the same file.  Obviously their output will be interspersed in
>> unpredictable order.
>>
>> You seem to imagine that awk's output is line-buffered.  But when
>> it writes to a pipe or file, its output is block-buffered.  This
>> is default stdio behavior.  Output is written in block-size increments
>> (16 kB in practice) without regard to lines.  So, yes, you can end
>> up with a fragment from a line written by process #1, followed by
>> lines from process #2, followed by the remainder of the line from
>> #1, etc.
>>
>> -- 
>> Christian "naddy" Weisgerber                          [email protected]
>>
> Right, a fflush() call after the printf makes the issue go away, but
> only since awk is being nice and issues a single write call for that
> single printf. Since awk afaik does not give such a guarantee, it is
> better to have each parallel invocation write to a separate file and
> then cat them together after all the awk runs are done.
>
>       -Otto

Hello Christian and Otto,

Thank you for setting me straight. The block vs line buffering issue should 
have been obvious to me. What got me confused was that this solution worked 
well, for a long time - until it didn't. One would assume that it would 
consistently mangle output...

While fflush does seem to fix the issue, I wanted to explore your suggestion 
Otto of writing to a temporary file from within awk.

Is something like the following a sane approach to safely generating temporary 
files from within awk?:

BEGIN{ cmd = "mktemp -q /tmp/workdir/tmp.XXXXXXX" ; if( ( cmd | getline result 
) > 0 ) TMPFILE = result ; else exit 1 }

Unless I'm missing something obvious, It seems there is no way to capture both 
the stdout and return code of an external command from within awk. My 
workaround solution to error check the call to mktemp here is to abort if 
mktemp returns no data. Is this sane?

Regards,

Jordan

Reply via email to