Stuart Henderson <[email protected]> writes:
> On 2022-08-15, gwes <[email protected]> wrote: >> Unexpected behavior: >> When I try to chain three programs together with pipes moving lots >> of data spin time goes up on most or all CPUs. >> Is this known or expected? >> >> the chain [shortened] was >> find /someplace -maxdepth 2 -type f -name '*.flac' -exec \ >> metaflac -list --block-type=VORBIS_COMMENT {} + | \ >> awk '{mangle}' | \ >> sed -e 's/foo/bar/' | \ >> sort -o outfile >> >> On a slow (1.6GHz 8GB) system with an SSD the spin times went >> as high as 50% on all cpus. When that happens the sys times >> also go very high rendering the system effectively >> unusable. Swap space used = 0. >> That system is no longer available for testing. >> >> This is a simplified example on a Ryzen 3600G w/64MB ram & reasonable >> rotating rust: >> The test file is 3.6G >> Under some test cases the spin times go as high as 20% on two or three CPUs >> simultaneously. Those are transient and hard to capture. >> I can rerun this using -CURRENT if that would give better information. >> >> Rsults have been more or less the same 6.9, 7.0, 7.1 >> >> 11881$ cat m2abc.txz m2abc.txz m2abc.txz | cat | cat | cat > xx > > -current looks similar. (I catted /bsd a few times to make a test file, > in this case the filesystem is on softraid crypto over nvme). > > CPU0: 0.0% user, 0.0% nice, 45.1% sys, 15.7% spin, 23.5% intr, 15.7% idle > CPU1: 0.0% user, 0.0% nice, 33.3% sys, 41.2% spin, 0.0% intr, 25.5% idle > CPU2: 2.0% user, 0.0% nice, 39.2% sys, 23.5% spin, 0.0% intr, 35.3% idle > CPU3: 2.0% user, 0.0% nice, 29.4% sys, 27.5% spin, 0.0% intr, 41.2% idle > Memory: Real: 5172M/15G act/tot Free: 654M Cache: 6685M Swap: 22M/2048M > > "yes | cat | cat | cat > xx" doesn't hit it. > > Don't know if it gives any clues but a flamegraph generated while > running the multiple cat pipe on my laptop looks like > https://junkpile.org/cat-spin-flame.svg _kernel_lock() was in 24% of the samples. If you add up the spin you get about 90%, which is ~24% of 400%...the max possible sum of spin across 4 cpus. Comes from the dofilereadv() -> vn_read() -> _kernel_lock() chain. I think that explains why your `yes | cat...` example doesn't hit it. I'm not on/near a machine with btrace available to generate a flamegraph for that example but curious what it looks like. -dv

