On Tue, 11 Oct, at 11:39:57AM, Matt Fleming wrote:
> On Tue, 11 Oct, at 10:44:25AM, Dietmar Eggemann wrote:
> > [...]
> > Yeah, you're right. But I can't see any significant difference. IMHO,
> > it's all in the noise.
> > (A) Performance counter stats for 'perf bench sched messaging -g 100 -l
> > 1 -t'
> > # 20 sender and receiver threads per group
> > # 100 groups == 4000 threads run
> FWIW, our tests run with 1000 loops, not 1, and we don't use 100
> groups for low cpu machines. We tend to test upto $((nproc * 4)).
It's worth pointing out that using less than ~655 loops allows the
'sender' tasks to write down the pipes (when using -p) without
blocking due to the pipe being full, since the default number of pipe
buffers is 16 (PIPE_DEF_BUFFERS) and each buffer is one page.
This has implications for the "optimal" schedule which in that case
is: schedule all sender tasks first, and run to exit, then schedule
all readers to read from the pipes.
Just something I noticed to watch out for when picking a loops count.
There's probably some similar issue for socket buffers.