Stuart Henderson <[email protected]> writes:

> On 2022-08-15, gwes <[email protected]> wrote:
>> Unexpected behavior:
>>    When I try to chain three programs together with pipes moving lots
>> of data spin time goes up on most or all CPUs.
>> Is this known or expected?
>>
>> the chain [shortened] was
>>    find /someplace -maxdepth 2 -type f -name '*.flac' -exec \
>>       metaflac -list --block-type=VORBIS_COMMENT {} + | \
>>       awk '{mangle}' | \
>>       sed -e 's/foo/bar/' | \
>>       sort -o outfile
>>
>> On a slow (1.6GHz 8GB) system with an SSD the spin times went
>> as high as 50% on all cpus. When that happens the sys times
>> also go very high rendering the system effectively
>> unusable. Swap space used = 0.
>> That system is no longer available for testing.
>>
>> This is a simplified example on a Ryzen 3600G w/64MB ram & reasonable
>> rotating rust:
>> The test file is 3.6G
>> Under some test cases the spin times go as high as 20% on two or three CPUs
>> simultaneously. Those are transient and hard to capture.
>> I can rerun this using -CURRENT if that would give better information.
>>
>> Rsults have been more or less the same 6.9, 7.0, 7.1
>>
>> 11881$ cat m2abc.txz m2abc.txz m2abc.txz | cat | cat | cat > xx
>
> -current looks similar. (I catted /bsd a few times to make a test file,
> in this case the filesystem is on softraid crypto over nvme).
>
> CPU0:  0.0% user,  0.0% nice, 45.1% sys, 15.7% spin, 23.5% intr, 15.7% idle
> CPU1:  0.0% user,  0.0% nice, 33.3% sys, 41.2% spin,  0.0% intr, 25.5% idle
> CPU2:  2.0% user,  0.0% nice, 39.2% sys, 23.5% spin,  0.0% intr, 35.3% idle
> CPU3:  2.0% user,  0.0% nice, 29.4% sys, 27.5% spin,  0.0% intr, 41.2% idle
> Memory: Real: 5172M/15G act/tot Free: 654M Cache: 6685M Swap: 22M/2048M
>
> "yes | cat | cat | cat > xx" doesn't hit it.
>
> Don't know if it gives any clues but a flamegraph generated while
> running the multiple cat pipe on my laptop looks like
> https://junkpile.org/cat-spin-flame.svg

_kernel_lock() was in 24% of the samples. If you add up the spin you get
about 90%, which is ~24% of 400%...the max possible sum of spin across 4
cpus.

Comes from the dofilereadv() -> vn_read() -> _kernel_lock() chain.

I think that explains why your `yes | cat...` example doesn't hit
it. I'm not on/near a machine with btrace available to generate a
flamegraph for that example but curious what it looks like.

-dv

Reply via email to