On 2020/12/10 19:34, Mel Gorman wrote:
> On Thu, Dec 10, 2020 at 04:23:47PM +0800, Li, Aubrey wrote:
>>> I ran this patch with tbench on top of of the schedstat patches that
>>> track SIS efficiency. The tracking adds overhead so it's not a perfect
>>> performance comparison but the expectation would be that the patch reduces
>>> the number of runqueues that are scanned
>>
>> Thanks for the measurement! I don't play with tbench so may need a while
>> to digest the data.
>>
> 
> They key point is that it appears the idle mask was mostly equivalent to
> the full domain mask, at least for this test.
> 
>>>
>>> tbench4
>>>                           5.10.0-rc6             5.10.0-rc6
>>>                       schedstat-v1r1          idlemask-v7r1
>>> Hmean     1        504.76 (   0.00%)      500.14 *  -0.91%*
>>> Hmean     2       1001.22 (   0.00%)      970.37 *  -3.08%*
>>> Hmean     4       1930.56 (   0.00%)     1880.96 *  -2.57%*
>>> Hmean     8       3688.05 (   0.00%)     3537.72 *  -4.08%*
>>> Hmean     16      6352.71 (   0.00%)     6439.53 *   1.37%*
>>> Hmean     32     10066.37 (   0.00%)    10124.65 *   0.58%*


>>> Hmean     64     12846.32 (   0.00%)    11627.27 *  -9.49%*

I focused on this case and run it 5 times, and here is the data on my side.
5 times x 600s tbench, thread number is 153(80% x 192(h/w thread num)).

Hmean 153               v5.9.12                 v5.9.12
                        schedstat-v1            idlemask-v8(with schedstat)
Round 1                 15717.3                 15608.1
Round 2                 14856.9                 15642.5
Round 3                 14856.7                 15782.1
Round 4                 15408.9                 15912.9
Round 5                 15436.6                 15927.7

>From tbench throughput data, bigger is better, it looks like idlemask wins

And here is SIS_scanned data:

Hmean 153               v5.9.12                 v5.9.12
                        schedstat-v1            idlemask-v8(with schedstat)
Round 1                 22562490432             21894932302
Round 2                 21288529957             21693722629
Round 3                 20657521771             21268308377
Round 4                 21868486414             22289128955
Round 5                 21859614988             22214740417

>From SIS_scanned data, less is better, it looks like the default one is better.

But combined with throughput data, this can be explained as bigger throughput
performs more SIS_scanned.

So at least, there is no regression of this case.

Thanks,
-Aubrey

Reply via email to