Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work

2021-01-28 Thread Xing Zhengjun




On 1/29/2021 2:08 AM, Paul E. McKenney wrote:

On Thu, Jan 28, 2021 at 05:09:05PM +0800, Hillf Danton wrote:

On Thu, 28 Jan 2021 15:52:40 +0800 Xing Zhengjun wrote:


[ . . . ]


I test the patch 4 times, no warning appears in the kernel log.


Thank you so much Zhengjun!

And the overall brain dump so far is

1/ before and after d5bff968ea, changing the allowed ptr at online time
is the key to quiesce the warning in process_one_work().

2/ marking pcpu before changing aptr in rebind_workers() is mandatory in
regards to cutting the risk of triggering such a warning.

3/ we canot maintain such an order without quiescing the 508 warning for
kworkers. And we have a couple of excuses to do so, a) the number of
allowed CPUs is no longer checked in is_per_cpu_kthread() instead of
PF_NO_SETAFFINITY, b) there is always a followup act to change the aptr
in order to fix the number of aCPUs.

4/ same order is maintained also at rescue time.


Just out of curiosity, does this test still fail on current mainline?

Thanx, Paul

I test mainline v5.11-rc5, it has no issue. The issue is only for 
d5bff968ea which is in 
https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git 
dev.2021.01.11b.


--
Zhengjun Xing


Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work

2021-01-27 Thread Xing Zhengjun



On 1/27/2021 5:21 PM, Hillf Danton wrote:

On Wed, 27 Jan 2021 16:04:25 +0800 Xing Zhengjun wrote:

On 1/26/2021 3:39 PM, Hillf Danton wrote:

On 26 Jan 2021 10:45:21 +0800 Xing Zhengjun wrote:

On 1/25/2021 5:29 PM, Hillf Danton wrote:

On 25 Jan 2021 16:31:32 +0800 Xing Zhengjun wrote:

On 1/22/2021 3:59 PM, Hillf Danton wrote:

On Fri, 22 Jan 2021 09:48:32 +0800 Xing Zhengjun wrote:

On 1/21/2021 12:00 PM, Hillf Danton wrote:

On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote:

On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote:

Thu, 14 Jan 2021 15:45:11 +0800


FYI, we noticed the following commit (built with gcc-9):

commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity 
initiatively")
https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git 
dev.2021.01.11b


[...]


[   73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 
process_one_work


Thanks for your report.

We can also break CPU affinity by checking POOL_DISASSOCIATED at attach
time without extra cost paid; that way we have the same behavior as at
the unbind time.

What is more the change that makes kworker pcpu is cut because they are
going to not help either hotplug or the mechanism of stop machine.


hi, by applying below patch, the issue still happened.


Thanks for your report.


[ 4.574467] pci :00:00.0: Limiting direct PCI/PCI transfers
[ 4.575651] pci :00:01.0: Activating ISA DMA hang workarounds
[ 4.576900] pci :00:02.0: Video device with shadowed ROM at [mem 
0x000c-0x000d]
[ 4.578648] PCI: CLS 0 bytes, default 64
[ 4.579685] Unpacking initramfs...
[ 8.878031] ---[ cut here ]---
[ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 
process_one_work+0x92/0x9e0
[ 8.880688] Modules linked in:
[ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 
5.11.0-rc3-gc213503139bb #2


The kworker bond to CPU1 runs on CPU0 and triggers the warning, which
shows that scheduler breaks CPU affinity, after 06249738a41a
("workqueue: Manually break affinity on hotplug"), though quite likely
by kworker/1:0 for the initial workers.


[ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
[ 8.887539] Workqueue: 0x0 (events)
[ 8.887838] EIP: process_one_work+0x92/0x9e0
[ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 
01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 
00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31
[ 8.887838] EAX: 42f51d08 EBX:  ECX:  EDX: 0001
[ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08
[ 8.887838] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010002
[ 8.887838] CR0: 80050033 CR2:  CR3: 034e3000 CR4: 000406d0
[ 8.887838] Call Trace:
[ 8.887838] ? worker_thread+0x98/0x6a0
[ 8.887838] ? worker_thread+0x2dd/0x6a0
[ 8.887838] ? kthread+0x1ba/0x1e0
[ 8.887838] ? create_worker+0x1e0/0x1e0
[ 8.887838] ? kzalloc+0x20/0x20
[ 8.887838] ? ret_from_fork+0x1c/0x28
[ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed
[ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with 
crng_init=0
[ 8.887838] --[ end trace ac461b4d54c37cfa ]--



Instead of creating the initial workers only on the active CPUS, rebind
them (labeled pcpu) and jump to the right CPU at bootup time.

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2385,6 +2385,16 @@ woke_up:
return 0;
}
  
+	if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() !=

+   pool->cpu) {
+   /* scheduler breaks CPU affinity for us, rebind it */
+   raw_spin_unlock_irq(>lock);
+   set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+   /* and jump to the right seat */
+   schedule_timeout_interruptible(1);
+   goto woke_up;
+   }
+
worker_leave_idle(worker);
  recheck:
/* no more worker necessary? */
--


I test the patch, the warning still appears in the kernel log.


Thanks for your report.


[  230.356503] smpboot: CPU 1 is now offline
[  230.544652] x86: Booting SMP configuration:
[  230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock
[  230.545675] masked ExtINT on CPU#1
[  230.593829] [ cut here ]
[  230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 
process_one_work+0x92/0x9e0
[  230.594990] Modules linked in: rcutorture torture mousedev input_leds
led_class pcspkr psmouse evbug tiny_power_button button
[  230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 
5.11.0-rc3-gdcba55d9080f #2


Like what was reported, kworker bond to CPU1 runs on CPU0 and triggers
warning, due to scheduler breaking CPU affinity for us. What is new, t

Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work

2021-01-27 Thread Xing Zhengjun



On 1/26/2021 3:39 PM, Hillf Danton wrote:

On 26 Jan 2021 10:45:21 +0800 Xing Zhengjun wrote:

On 1/25/2021 5:29 PM, Hillf Danton wrote:

On 25 Jan 2021 16:31:32 +0800 Xing Zhengjun wrote:

On 1/22/2021 3:59 PM, Hillf Danton wrote:

On Fri, 22 Jan 2021 09:48:32 +0800 Xing Zhengjun wrote:

On 1/21/2021 12:00 PM, Hillf Danton wrote:

On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote:

On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote:

Thu, 14 Jan 2021 15:45:11 +0800


FYI, we noticed the following commit (built with gcc-9):

commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity 
initiatively")
https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git 
dev.2021.01.11b


[...]


[   73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 
process_one_work


Thanks for your report.

We can also break CPU affinity by checking POOL_DISASSOCIATED at attach
time without extra cost paid; that way we have the same behavior as at
the unbind time.

What is more the change that makes kworker pcpu is cut because they are
going to not help either hotplug or the mechanism of stop machine.


hi, by applying below patch, the issue still happened.


Thanks for your report.


[ 4.574467] pci :00:00.0: Limiting direct PCI/PCI transfers
[ 4.575651] pci :00:01.0: Activating ISA DMA hang workarounds
[ 4.576900] pci :00:02.0: Video device with shadowed ROM at [mem 
0x000c-0x000d]
[ 4.578648] PCI: CLS 0 bytes, default 64
[ 4.579685] Unpacking initramfs...
[ 8.878031] ---[ cut here ]---
[ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 
process_one_work+0x92/0x9e0
[ 8.880688] Modules linked in:
[ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 
5.11.0-rc3-gc213503139bb #2


The kworker bond to CPU1 runs on CPU0 and triggers the warning, which
shows that scheduler breaks CPU affinity, after 06249738a41a
("workqueue: Manually break affinity on hotplug"), though quite likely
by kworker/1:0 for the initial workers.


[ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
[ 8.887539] Workqueue: 0x0 (events)
[ 8.887838] EIP: process_one_work+0x92/0x9e0
[ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 
01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 
00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31
[ 8.887838] EAX: 42f51d08 EBX:  ECX:  EDX: 0001
[ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08
[ 8.887838] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010002
[ 8.887838] CR0: 80050033 CR2:  CR3: 034e3000 CR4: 000406d0
[ 8.887838] Call Trace:
[ 8.887838] ? worker_thread+0x98/0x6a0
[ 8.887838] ? worker_thread+0x2dd/0x6a0
[ 8.887838] ? kthread+0x1ba/0x1e0
[ 8.887838] ? create_worker+0x1e0/0x1e0
[ 8.887838] ? kzalloc+0x20/0x20
[ 8.887838] ? ret_from_fork+0x1c/0x28
[ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed
[ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with 
crng_init=0
[ 8.887838] --[ end trace ac461b4d54c37cfa ]--



Instead of creating the initial workers only on the active CPUS, rebind
them (labeled pcpu) and jump to the right CPU at bootup time.

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2385,6 +2385,16 @@ woke_up:
return 0;
}
 
+	if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() !=

+   pool->cpu) {
+   /* scheduler breaks CPU affinity for us, rebind it */
+   raw_spin_unlock_irq(>lock);
+   set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+   /* and jump to the right seat */
+   schedule_timeout_interruptible(1);
+   goto woke_up;
+   }
+
worker_leave_idle(worker);
 recheck:
/* no more worker necessary? */
--


I test the patch, the warning still appears in the kernel log.


Thanks for your report.


[  230.356503] smpboot: CPU 1 is now offline
[  230.544652] x86: Booting SMP configuration:
[  230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock
[  230.545675] masked ExtINT on CPU#1
[  230.593829] [ cut here ]
[  230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 
process_one_work+0x92/0x9e0
[  230.594990] Modules linked in: rcutorture torture mousedev input_leds
led_class pcspkr psmouse evbug tiny_power_button button
[  230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 
5.11.0-rc3-gdcba55d9080f #2


Like what was reported, kworker bond to CPU1 runs on CPU0 and triggers
warning, due to scheduler breaking CPU affinity for us. What is new, the
affinity was broken at offline time instead of bootup.


[  230.596621] Hardware name: QEMU Stan

Re: Test report for kernel direct mapping performance

2021-01-27 Thread Xing Zhengjun




On 1/26/2021 11:00 PM, Michal Hocko wrote:

On Fri 15-01-21 15:23:07, Xing Zhengjun wrote:

Hi,

There is currently a bit of a debate about the kernel direct map. Does using
2M/1G pages aggressively for the kernel direct map help performance? Or, is
it an old optimization which is not as helpful on modern CPUs as it was in
the old days? What is the penalty of a kernel feature that heavily demotes
this mapping from larger to smaller pages? We did a set of runs with 1G and
2M pages enabled /disabled and saw the changes.

[Conclusions]

Assuming that this was a good representative set of workloads and that the
data are good, for server usage, we conclude that the existing aggressive
use of 1G mappings is a good choice since it represents the best in a
plurality of the workloads. However, in a *majority* of cases, another
mapping size (2M or 4k) potentially offers a performance improvement. This
leads us to conclude that although 1G mappings are a good default choice,
there is no compelling evidence that it must be the only choice, or that
folks deriving benefits (like hardening) from smaller mapping sizes should
avoid the smaller mapping sizes.


Thanks for conducting these tests! This is definitely useful and quite
honestly I would have expected a much more noticeable differences.
Please note that I am not really deep into benchmarking but one thing
that popped in my mind was whethere these (micro)benchmarks are really
representative workloads. Some of them tend to be rather narrow in
executed code paths or data structures used AFAIU. Is it possible they
simply didn't generate sufficient TLB pressure?



The test was done on 4 server platforms with 11 benchmarks which 0day 
run daily. For the 11 different benchmarks that were used, echo 
benchmarks have a lot of subcases, so there was a total of 259 test 
cases. The test memory size for the 4 server platform ranges from 128GB 
to 512GB. Yes, some of the benchmarks tend to be narrow in executed code 
paths or data structures. So we run a total of 259 cases which include 
test cases in memory, CPU scheduling, network, io, and database, try to 
cover most of the code path. For the 11 benchmarks, some of them may not 
generate sufficient TLB pressure, but I think cases in vm-scalability 
and will-it-scale may generate sufficient TLB pressure. I have provided 
the test results for different benchmarks, if you are interested, you 
can see in the details of the test report: 
https://01.org/sites/default/files/documentation/test_report_for_kernel_direct_mapping_performance_0.pdf




Have you tried to look closer on profiles of respective configurations
where the overhead comes from?



The test cases selected from the 0day daily run cases, just use the 
different kernel settings;
Enable both 2M and 1G huge pages (up to 1G, so named to "1G" in the test 
report):

   no extra kernel command line need
Disable 1G pages (up to 2M, so named to 2M in the test report):
  add kernel command line "nogbpages"
Disable both 2M and 1G huge pages (up to 4k, so named to 4K in the test 
report):

  add kernel command line "nohugepages_mapping" (by debug patch)

User spaces add THP enabled setting for all the three kernels (1G/2M/4K)
  transparent_hugepage:
  thp_enabled: always
  thp_defrag: always

During the test, we enabled some monitors, but the overhead should be 
not too big, most of the overhead should be the test cases themselves.
I will study some test cases to find the hotspot from which overhead 
comes from and provide it later if someone is interested in it.



--
Zhengjun Xing


Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work

2021-01-26 Thread Xing Zhengjun



On 1/25/2021 5:29 PM, Hillf Danton wrote:

On 25 Jan 2021 16:31:32 +0800 Xing Zhengjun wrote:

On 1/22/2021 3:59 PM, Hillf Danton wrote:

On Fri, 22 Jan 2021 09:48:32 +0800 Xing Zhengjun wrote:

On 1/21/2021 12:00 PM, Hillf Danton wrote:

On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote:

On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote:

Thu, 14 Jan 2021 15:45:11 +0800


FYI, we noticed the following commit (built with gcc-9):

commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity 
initiatively")
https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git 
dev.2021.01.11b


[...]


[   73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 
process_one_work


Thanks for your report.

We can also break CPU affinity by checking POOL_DISASSOCIATED at attach
time without extra cost paid; that way we have the same behavior as at
the unbind time.

What is more the change that makes kworker pcpu is cut because they are
going to not help either hotplug or the mechanism of stop machine.


hi, by applying below patch, the issue still happened.


Thanks for your report.


[ 4.574467] pci :00:00.0: Limiting direct PCI/PCI transfers
[ 4.575651] pci :00:01.0: Activating ISA DMA hang workarounds
[ 4.576900] pci :00:02.0: Video device with shadowed ROM at [mem 
0x000c-0x000d]
[ 4.578648] PCI: CLS 0 bytes, default 64
[ 4.579685] Unpacking initramfs...
[ 8.878031] ---[ cut here ]---
[ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 
process_one_work+0x92/0x9e0
[ 8.880688] Modules linked in:
[ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 
5.11.0-rc3-gc213503139bb #2


The kworker bond to CPU1 runs on CPU0 and triggers the warning, which
shows that scheduler breaks CPU affinity, after 06249738a41a
("workqueue: Manually break affinity on hotplug"), though quite likely
by kworker/1:0 for the initial workers.


[ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
[ 8.887539] Workqueue: 0x0 (events)
[ 8.887838] EIP: process_one_work+0x92/0x9e0
[ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 
01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 
00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31
[ 8.887838] EAX: 42f51d08 EBX:  ECX:  EDX: 0001
[ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08
[ 8.887838] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010002
[ 8.887838] CR0: 80050033 CR2:  CR3: 034e3000 CR4: 000406d0
[ 8.887838] Call Trace:
[ 8.887838] ? worker_thread+0x98/0x6a0
[ 8.887838] ? worker_thread+0x2dd/0x6a0
[ 8.887838] ? kthread+0x1ba/0x1e0
[ 8.887838] ? create_worker+0x1e0/0x1e0
[ 8.887838] ? kzalloc+0x20/0x20
[ 8.887838] ? ret_from_fork+0x1c/0x28
[ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed
[ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with 
crng_init=0
[ 8.887838] --[ end trace ac461b4d54c37cfa ]--



Instead of creating the initial workers only on the active CPUS, rebind
them (labeled pcpu) and jump to the right CPU at bootup time.

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2385,6 +2385,16 @@ woke_up:
return 0;
}

+	if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() !=

+   pool->cpu) {
+   /* scheduler breaks CPU affinity for us, rebind it */
+   raw_spin_unlock_irq(>lock);
+   set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+   /* and jump to the right seat */
+   schedule_timeout_interruptible(1);
+   goto woke_up;
+   }
+
worker_leave_idle(worker);
recheck:
/* no more worker necessary? */
--


I test the patch, the warning still appears in the kernel log.


Thanks for your report.


[  230.356503] smpboot: CPU 1 is now offline
[  230.544652] x86: Booting SMP configuration:
[  230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock
[  230.545675] masked ExtINT on CPU#1
[  230.593829] [ cut here ]
[  230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 
process_one_work+0x92/0x9e0
[  230.594990] Modules linked in: rcutorture torture mousedev input_leds
led_class pcspkr psmouse evbug tiny_power_button button
[  230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 
5.11.0-rc3-gdcba55d9080f #2


Like what was reported, kworker bond to CPU1 runs on CPU0 and triggers
warning, due to scheduler breaking CPU affinity for us. What is new, the
affinity was broken at offline time instead of bootup.


[  230.596621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
[  230.597322] Workqueue:  0x0 (rcu_gp)
[  230.5976

Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work

2021-01-25 Thread Xing Zhengjun



On 1/22/2021 3:59 PM, Hillf Danton wrote:

On Fri, 22 Jan 2021 09:48:32 +0800 Xing Zhengjun wrote:

On 1/21/2021 12:00 PM, Hillf Danton wrote:

On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote:

On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote:

Thu, 14 Jan 2021 15:45:11 +0800


FYI, we noticed the following commit (built with gcc-9):

commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity 
initiatively")
https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git 
dev.2021.01.11b


[...]


[   73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 
process_one_work


Thanks for your report.

We can also break CPU affinity by checking POOL_DISASSOCIATED at attach
time without extra cost paid; that way we have the same behavior as at
the unbind time.

What is more the change that makes kworker pcpu is cut because they are
going to not help either hotplug or the mechanism of stop machine.


hi, by applying below patch, the issue still happened.


Thanks for your report.


[ 4.574467] pci :00:00.0: Limiting direct PCI/PCI transfers
[ 4.575651] pci :00:01.0: Activating ISA DMA hang workarounds
[ 4.576900] pci :00:02.0: Video device with shadowed ROM at [mem 
0x000c-0x000d]
[ 4.578648] PCI: CLS 0 bytes, default 64
[ 4.579685] Unpacking initramfs...
[ 8.878031] ---[ cut here ]---
[ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 
process_one_work+0x92/0x9e0
[ 8.880688] Modules linked in:
[ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 
5.11.0-rc3-gc213503139bb #2


The kworker bond to CPU1 runs on CPU0 and triggers the warning, which
shows that scheduler breaks CPU affinity, after 06249738a41a
("workqueue: Manually break affinity on hotplug"), though quite likely
by kworker/1:0 for the initial workers.


[ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
[ 8.887539] Workqueue: 0x0 (events)
[ 8.887838] EIP: process_one_work+0x92/0x9e0
[ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 
01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 
00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31
[ 8.887838] EAX: 42f51d08 EBX:  ECX:  EDX: 0001
[ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08
[ 8.887838] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010002
[ 8.887838] CR0: 80050033 CR2:  CR3: 034e3000 CR4: 000406d0
[ 8.887838] Call Trace:
[ 8.887838] ? worker_thread+0x98/0x6a0
[ 8.887838] ? worker_thread+0x2dd/0x6a0
[ 8.887838] ? kthread+0x1ba/0x1e0
[ 8.887838] ? create_worker+0x1e0/0x1e0
[ 8.887838] ? kzalloc+0x20/0x20
[ 8.887838] ? ret_from_fork+0x1c/0x28
[ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed
[ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with 
crng_init=0
[ 8.887838] --[ end trace ac461b4d54c37cfa ]--



Instead of creating the initial workers only on the active CPUS, rebind
them (labeled pcpu) and jump to the right CPU at bootup time.

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2385,6 +2385,16 @@ woke_up:
return 0;
}
   
+	if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() !=

+   pool->cpu) {
+   /* scheduler breaks CPU affinity for us, rebind it */
+   raw_spin_unlock_irq(>lock);
+   set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+   /* and jump to the right seat */
+   schedule_timeout_interruptible(1);
+   goto woke_up;
+   }
+
worker_leave_idle(worker);
   recheck:
/* no more worker necessary? */
--


I test the patch, the warning still appears in the kernel log.


Thanks for your report.


[  230.356503] smpboot: CPU 1 is now offline
[  230.544652] x86: Booting SMP configuration:
[  230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock
[  230.545675] masked ExtINT on CPU#1
[  230.593829] [ cut here ]
[  230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 
process_one_work+0x92/0x9e0
[  230.594990] Modules linked in: rcutorture torture mousedev input_leds
led_class pcspkr psmouse evbug tiny_power_button button
[  230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 
5.11.0-rc3-gdcba55d9080f #2


Like what was reported, kworker bond to CPU1 runs on CPU0 and triggers
warning, due to scheduler breaking CPU affinity for us. What is new, the
affinity was broken at offline time instead of bootup.


[  230.596621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
[  230.597322] Workqueue:  0x0 (rcu_gp)
[  230.597636] EIP: process_one_work+0x92/0x9e0
[  230.598005] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 

Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work

2021-01-21 Thread Xing Zhengjun



On 1/21/2021 12:00 PM, Hillf Danton wrote:

On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote:

On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote:

Thu, 14 Jan 2021 15:45:11 +0800


FYI, we noticed the following commit (built with gcc-9):

commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity 
initiatively")
https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git 
dev.2021.01.11b


[...]


[   73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 
process_one_work


Thanks for your report.

We can also break CPU affinity by checking POOL_DISASSOCIATED at attach
time without extra cost paid; that way we have the same behavior as at
the unbind time.

What is more the change that makes kworker pcpu is cut because they are
going to not help either hotplug or the mechanism of stop machine.


hi, by applying below patch, the issue still happened.


Thanks for your report.


[ 4.574467] pci :00:00.0: Limiting direct PCI/PCI transfers
[ 4.575651] pci :00:01.0: Activating ISA DMA hang workarounds
[ 4.576900] pci :00:02.0: Video device with shadowed ROM at [mem 
0x000c-0x000d]
[ 4.578648] PCI: CLS 0 bytes, default 64
[ 4.579685] Unpacking initramfs...
[ 8.878031] ---[ cut here ]---
[ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 
process_one_work+0x92/0x9e0
[ 8.880688] Modules linked in:
[ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 
5.11.0-rc3-gc213503139bb #2


The kworker bond to CPU1 runs on CPU0 and triggers the warning, which
shows that scheduler breaks CPU affinity, after 06249738a41a
("workqueue: Manually break affinity on hotplug"), though quite likely
by kworker/1:0 for the initial workers.


[ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
[ 8.887539] Workqueue: 0x0 (events)
[ 8.887838] EIP: process_one_work+0x92/0x9e0
[ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 
01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 
00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31
[ 8.887838] EAX: 42f51d08 EBX:  ECX:  EDX: 0001
[ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08
[ 8.887838] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010002
[ 8.887838] CR0: 80050033 CR2:  CR3: 034e3000 CR4: 000406d0
[ 8.887838] Call Trace:
[ 8.887838] ? worker_thread+0x98/0x6a0
[ 8.887838] ? worker_thread+0x2dd/0x6a0
[ 8.887838] ? kthread+0x1ba/0x1e0
[ 8.887838] ? create_worker+0x1e0/0x1e0
[ 8.887838] ? kzalloc+0x20/0x20
[ 8.887838] ? ret_from_fork+0x1c/0x28
[ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed
[ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with 
crng_init=0
[ 8.887838] --[ end trace ac461b4d54c37cfa ]--



Instead of creating the initial workers only on the active CPUS, rebind
them (labeled pcpu) and jump to the right CPU at bootup time.

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2385,6 +2385,16 @@ woke_up:
return 0;
}
  
+	if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() !=

+   pool->cpu) {
+   /* scheduler breaks CPU affinity for us, rebind it */
+   raw_spin_unlock_irq(>lock);
+   set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+   /* and jump to the right seat */
+   schedule_timeout_interruptible(1);
+   goto woke_up;
+   }
+
worker_leave_idle(worker);
  recheck:
/* no more worker necessary? */
--


I test the patch, the warning still appears in the kernel log.

[  230.356503] smpboot: CPU 1 is now offline
[  230.544652] x86: Booting SMP configuration:
[  230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock
[  230.545675] masked ExtINT on CPU#1
[  230.593829] [ cut here ]
[  230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 
process_one_work+0x92/0x9e0
[  230.594990] Modules linked in: rcutorture torture mousedev input_leds 
led_class pcspkr psmouse evbug tiny_power_button button
[  230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 
5.11.0-rc3-gdcba55d9080f #2
[  230.596621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.12.0-1 04/01/2014

[  230.597322] Workqueue:  0x0 (rcu_gp)
[  230.597636] EIP: process_one_work+0x92/0x9e0
[  230.598005] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 
00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 f4 85 13 00 ff 05 cc 30 04 
43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31

[  230.599569] EAX: 42f51d08 EBX:  ECX:  EDX: 0001
[  230.600100] ESI: 43d94240 EDI: df4040f4 EBP: de7f23c0 ESP: bf5f1f08
[  230.600629] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010002
[  

Re: [LKP] Re: [percpu_ref] 2b0d3d3e4f: reaim.jobs_per_min -18.4% regression

2021-01-18 Thread Xing, Zhengjun




On 1/11/2021 5:58 PM, Ming Lei wrote:

On Sun, Jan 10, 2021 at 10:32:47PM +0800, kernel test robot wrote:

Greeting,

FYI, we noticed a -18.4% regression of reaim.jobs_per_min due to commit:


commit: 2b0d3d3e4fcfb19d10f9a82910b8f0f05c56ee3e ("percpu_ref: reduce memory 
footprint of percpu_ref in fast path")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: reaim
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 
192G memory
with following parameters:

runtime: 300s
nr_task: 100%
test: short
cpufreq_governor: performance
ucode: 0x5002f01

test-description: REAIM is an updated and improved version of AIM 7 benchmark.
test-url: https://sourceforge.net/projects/re-aim-7/

In addition to that, the commit also has significant impact on the following 
tests:

+--+---+
| testcase: change | vm-scalability: vm-scalability.throughput -2.8% regression 
   |
| test machine | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 
with 192G memory |
| test parameters  | cpufreq_governor=performance   
   |
|  | runtime=300s   
   |
|  | test=lru-file-mmap-read-rand   
   |
|  | ucode=0x5003003
   |
+--+---+
| testcase: change | will-it-scale: will-it-scale.per_process_ops 14.5% 
improvement|
| test machine | 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 
512G memory|
| test parameters  | cpufreq_governor=performance   
   |
|  | mode=process   
   |
|  | nr_task=50%
   |
|  | test=page_fault2   
   |
|  | ucode=0x16 
   |
+--+---+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -13.0% 
regression|
| test machine | 104 threads Skylake with 192G memory   
   |
| test parameters  | cpufreq_governor=performance   
   |
|  | mode=process   
   |
|  | nr_task=50%
   |
|  | test=malloc1   
   |
|  | ucode=0x2006906
   |
+--+---+
| testcase: change | vm-scalability: vm-scalability.throughput -2.3% regression 
   |
| test machine | 96 threads Intel(R) Xeon(R) CPU @ 2.30GHz with 128G memory 
   |
| test parameters  | cpufreq_governor=performance   
   |
|  | runtime=300s   
   |
|  | test=lru-file-mmap-read-rand   
   |
|  | ucode=0x5002f01
   |
+--+---+
| testcase: change | fio-basic: fio.read_iops -4.8% regression  
   |
| test machine | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 
with 192G memory |
| test parameters  | bs=4k  
   |
|  | cpufreq_governor=performance   
   |
|  | disk=2pmem 
   |
|  | fs=xfs 
   |
|  | ioengine=libaio
   |
|  | nr_task=50%
   |
|  | runtime=200s   
   |
|  | rw=randread
   |
|  | test_size=200G 
   |
|  

Test report for kernel direct mapping performance

2021-01-14 Thread Xing Zhengjun

Hi,

There is currently a bit of a debate about the kernel direct map. Does 
using 2M/1G pages aggressively for the kernel direct map help 
performance? Or, is it an old optimization which is not as helpful on 
modern CPUs as it was in the old days? What is the penalty of a kernel 
feature that heavily demotes this mapping from larger to smaller pages? 
We did a set of runs with 1G and 2M pages enabled /disabled and saw the 
changes.


[Conclusions]

Assuming that this was a good representative set of workloads and that 
the data are good, for server usage, we conclude that the existing 
aggressive use of 1G mappings is a good choice since it represents the 
best in a plurality of the workloads. However, in a *majority* of cases, 
another mapping size (2M or 4k) potentially offers a performance 
improvement. This leads us to conclude that although 1G mappings are a 
good default choice, there is no compelling evidence that it must be the 
only choice, or that folks deriving benefits (like hardening) from 
smaller mapping sizes should avoid the smaller mapping sizes.


[Summary of results]

1. The test was done on server platforms with 11 benchmarks. For the 4 
different server platforms tested, each with three different maximums 
kernel mapping sizes: 4k, 2M, and 1G. Each system has enough memory to 
effectively deploy 1G mappings.  For the 11 different benchmarks were 
used, not every benchmark was run on every system, there was a total of 
259 tests.


2. For each benchmark/system combination, the 1G mapping had the highest 
performance for 45% of the tests, 2M for ~30%, and 4k for~20%.


3. From the average delta, among 1G/2M/4K, 4K gets the lowest 
performance in all the 4 test machines, while 1G gets the best 
performance on 2 test machines and 2M gets the best performance on the 
other 2 machines.


4. By testing with machine memory from 256G to 512G, we observed that 
the larger memory will lead to the performance better for 1G page size. 
With Large memory, 
Will-it-scale/vm-scalability/unixbench/reaim/hackbench shows 1G has the 
best performance, while kbuild/memtier/netperf shows 4K has the best 
performance.


For more details please see the following web link:

https://01.org/sites/default/files/documentation/test_report_for_kernel_direct_mapping_performance_0.pdf

--
Zhengjun Xing


Re: [LKP] Re: [btrfs] e076ab2a2c: fio.write_iops -18.3% regression

2021-01-12 Thread Xing Zhengjun




On 1/12/2021 11:45 PM, David Sterba wrote:

On Tue, Jan 12, 2021 at 11:36:14PM +0800, kernel test robot wrote:

Greeting,

FYI, we noticed a -18.3% regression of fio.write_iops due to commit:


commit: e076ab2a2ca70a0270232067cd49f76cd92efe64 ("btrfs: shrink delalloc pages 
instead of full inodes")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: fio-basic
on test machine: 192 threads Intel(R) Xeon(R) CPU @ 2.20GHz with 192G memory
with following parameters:

disk: 1SSD
fs: btrfs
runtime: 300s
nr_task: 8
rw: randwrite
bs: 4k
ioengine: sync
test_size: 256g

Though I do a similar test (emulating bit torrent workload), it's a bit
extreme as it's 4k synchronous on a huge file. It always takes a lot of
time but could point out some concurrency issues namely on faster
devices. There are 8 threads possibly competing for the same inode lock
or other locks related to it.

The mentioned commit fixed another perf regression on a much more common
workload (untgrring files), so at this point drop in this fio workload
is inevitable.


Do you have a plan to fix it? Thanks.

___
LKP mailing list -- l...@lists.01.org
To unsubscribe send an email to lkp-le...@lists.01.org


--
Zhengjun Xing



Re: [LKP] [locking/rwsem] 617f3ef951: unixbench.score -21.2% regression

2020-12-22 Thread Xing Zhengjun

Hi Waiman,

   Do you have time to look at this? Thanks.
   As you describe in commit: 617f3ef95177840c77f59c2aec1029d27d5547d6 
("locking/rwsem: Remove reader optimistic spinning"), The patch that 
disables reader optimistic spinning shows reduced performance at lightly 
loaded cases, so for this regression, Is it as expected?


On 12/17/2020 9:33 AM, kernel test robot wrote:


Greeting,

FYI, we noticed a -21.2% regression of unixbench.score due to commit:


commit: 617f3ef95177840c77f59c2aec1029d27d5547d6 ("locking/rwsem: Remove reader 
optimistic spinning")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: unixbench
on test machine: 16 threads Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz with 32G 
memory
with following parameters:

runtime: 300s
nr_task: 30%
test: shell8
cpufreq_governor: performance
ucode: 0xde

test-description: UnixBench is the original BYTE UNIX benchmark suite aims to 
test performance of Unix-like system.
test-url: https://github.com/kdlucas/byte-unixbench



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
-->


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

=
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
   
gcc-9/performance/x86_64-rhel-8.3/30%/debian-10.4-x86_64-20200603.cgz/300s/lkp-cfl-e1/shell8/unixbench/0xde

commit:
   1a728dff85 ("locking/rwsem: Enable reader optimistic lock stealing")
   617f3ef951 ("locking/rwsem: Remove reader optimistic spinning")

1a728dff855a318b 617f3ef95177840c77f59c2aec1
 ---
fail:runs  %reproductionfail:runs
| | |
  39:4 -992%:4 
perf-profile.calltrace.cycles-pp.error_entry
  25:4 -635%:4 
perf-profile.children.cycles-pp.error_entry
  %stddev %change %stddev
  \  |\
  21807 ±  3% -21.2%  17186unixbench.score
1287072 ±  3% -38.7% 788414
unixbench.time.involuntary_context_switches
  37161 ±  4% +31.3%  48798unixbench.time.major_page_faults
  1.047e+08 ±  3% -21.1%   82610985unixbench.time.minor_page_faults
   1341   -27.1% 978.00
unixbench.time.percent_of_cpu_this_job_got
 370.87   -33.3% 247.55unixbench.time.system_time
 490.05   -23.3% 376.03unixbench.time.user_time
3083520 ±  3% +59.7%4924900
unixbench.time.voluntary_context_switches
 824314 ±  3% -21.2% 649654unixbench.workload
   0.03 ± 27% -51.9%   0.02 ± 59%  
perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
 385.15 ±  2% +62.5% 625.72uptime.idle
  17.03-1.8%  16.73boot-time.boot
  11.01-1.6%  10.83boot-time.dhcp
 214.12 ±  3%  -3.1% 207.49boot-time.idle
  13.72 ±  4% +23.5   37.24mpstat.cpu.all.idle%
   1.06-0.10.94mpstat.cpu.all.irq%
  49.32 ±  2% -11.8   37.53mpstat.cpu.all.sys%
  35.24 ±  2% -11.6   23.68mpstat.cpu.all.usr%
  15.50 ±  3%+145.2%  38.00vmstat.cpu.id
  49.00 ±  2% -22.4%  38.00vmstat.cpu.sy
  33.75 ±  2% -33.3%  22.50 ±  2%  vmstat.cpu.us
  21.75 ±  3% -33.3%  14.50 ±  3%  vmstat.procs.r
  97370 ±  3% +56.4% 152258vmstat.system.cs
  37589-2.1%  36804vmstat.system.in
  11861 ±  9% -18.0%   9730slabinfo.filp.active_objs
  13242 ±  8% -15.5%  11184slabinfo.filp.num_objs
  14731 ±  7%  -9.5%  13325 ±  5%  slabinfo.kmalloc-8.active_objs
  14731 ±  7%  -9.5%  13325 ±  5%  slabinfo.kmalloc-8.num_objs
   5545 ±  2% -13.8%   4780 ±  4%  slabinfo.pid.active_objs
   5563 ±  2% -13.8%   4793 ±  4%  slabinfo.pid.num_objs
   5822 ± 14% -40.4%   3468 ±  5%  
slabinfo.task_delay_info.active_objs
   5825 ± 14% -40.5%   3468 ±  5%  slabinfo.task_delay_info.num_objs
   32104492 ±  3%+303.3%  1.295e+08 ± 11%  cpuidle.C1.time
 882330 ±  5%+131.5%2042656 ± 10%  cpuidle.C1.usage
   21965263 ±  3%+340.5%   96762398 ± 14%  cpuidle.C1E.time
 442911 ±  2%+211.3%1378866 ± 14%  cpuidle.C1E.usage
6511399 ±  4%+606.6%   46010023 ± 

Re: [LKP] Re: [sched/hotplug] 2558aacff8: will-it-scale.per_thread_ops -1.6% regression

2020-12-14 Thread Xing Zhengjun




On 12/11/2020 12:14 AM, Peter Zijlstra wrote:

On Thu, Dec 10, 2020 at 04:18:59PM +0800, kernel test robot wrote:

FYI, we noticed a -1.6% regression of will-it-scale.per_thread_ops due to 
commit:
commit: 2558aacff8586699bcd248b406febb28b0a25de2 ("sched/hotplug: Ensure only 
per-cpu kthreads run during hotplug")


Mooo, weird but whatever. Does the below help at all?


I test the patch, the regression reduced to -0.6%.

=
tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode:

lkp-cpl-4sp1/will-it-scale/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/100%/thread/sched_yield/performance/0x71e

commit:
  565790d28b1e33ee2f77bad5348b99f6dfc366fd
  2558aacff8586699bcd248b406febb28b0a25de2
  4b26139b8db627a55043183614a32b0aba799d27 (this test patch)

565790d28b1e33ee 2558aacff8586699bcd248b406f 4b26139b8db627a55043183614a
 --- ---
 %stddev %change %stddev %change %stddev
 \  |\  |\
 4.011e+08-1.6%  3.945e+08-0.6%  3.989e+08 
   will-it-scale.144.threads
   2785455-1.6%2739520-0.6%2769967 
   will-it-scale.per_thread_ops
 4.011e+08-1.6%  3.945e+08-0.6%  3.989e+08 
   will-it-scale.workload




---
  kernel/sched/core.c  | 40 +++-
  kernel/sched/sched.h | 13 +
  2 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7af80c3fce12..f80245c7f903 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3985,15 +3985,20 @@ static void do_balance_callbacks(struct rq *rq, struct 
callback_head *head)
}
  }
  
+static void balance_push(struct rq *rq);

+
+struct callback_head balance_push_callback = {
+   .next = NULL,
+   .func = (void (*)(struct callback_head *))balance_push,
+};
+
  static inline struct callback_head *splice_balance_callbacks(struct rq *rq)
  {
struct callback_head *head = rq->balance_callback;
  
  	lockdep_assert_held(>lock);

-   if (head) {
+   if (head)
rq->balance_callback = NULL;
-   rq->balance_flags &= ~BALANCE_WORK;
-   }
  
  	return head;

  }
@@ -4014,21 +4019,6 @@ static inline void balance_callbacks(struct rq *rq, 
struct callback_head *head)
}
  }
  
-static void balance_push(struct rq *rq);

-
-static inline void balance_switch(struct rq *rq)
-{
-   if (likely(!rq->balance_flags))
-   return;
-
-   if (rq->balance_flags & BALANCE_PUSH) {
-   balance_push(rq);
-   return;
-   }
-
-   __balance_callbacks(rq);
-}
-
  #else
  
  static inline void __balance_callbacks(struct rq *rq)

@@ -4044,10 +4034,6 @@ static inline void balance_callbacks(struct rq *rq, 
struct callback_head *head)
  {
  }
  
-static inline void balance_switch(struct rq *rq)

-{
-}
-
  #endif
  
  static inline void

@@ -4075,7 +4061,7 @@ static inline void finish_lock_switch(struct rq *rq)
 * prev into current:
 */
spin_acquire(>lock.dep_map, 0, 0, _THIS_IP_);
-   balance_switch(rq);
+   __balance_callbacks(rq);
raw_spin_unlock_irq(>lock);
  }
  
@@ -7256,6 +7242,10 @@ static void balance_push(struct rq *rq)
  
  	lockdep_assert_held(>lock);

SCHED_WARN_ON(rq->cpu != smp_processor_id());
+   /*
+* Ensure the thing is persistent until balance_push_set(, on = false);
+*/
+   rq->balance_callback = _push_callback;
  
  	/*

 * Both the cpu-hotplug and stop task are in this case and are
@@ -7305,9 +7295,9 @@ static void balance_push_set(int cpu, bool on)
  
  	rq_lock_irqsave(rq, );

if (on)
-   rq->balance_flags |= BALANCE_PUSH;
+   rq->balance_callback = _push_callback;
else
-   rq->balance_flags &= ~BALANCE_PUSH;
+   rq->balance_callback = NULL;
rq_unlock_irqrestore(rq, );
  }
  
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h

index f5acb6c5ce49..12ada79d40f3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -975,7 +975,6 @@ struct rq {
unsigned long   cpu_capacity_orig;
  
  	struct callback_head	*balance_callback;

-   unsigned char   balance_flags;
  
  	unsigned char		nohz_idle_balance;

unsigned char   idle_balance;
@@ -1226,6 +1225,8 @@ struct rq_flags {
  #endif
  };
  
+extern struct callback_head balance_push_callback;

+
  /*
   * Lockdep annotation that avoids accidental unlocks; it's like a
   * sticky/continuous lockdep_assert_held().
@@ -1243,9 +1244,9 @@ static inline void rq_pin_lock(struct rq *rq, struct 
rq_flags *rf)
  #ifdef CONFIG_SCHED_DEBUG
rq->clock_update_flags &= 

Re: [Intel-gfx] [drm/i915/gem] 59dd13ad31: phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second -54.0% regression

2020-11-26 Thread Xing Zhengjun




On 11/27/2020 5:34 AM, Chris Wilson wrote:

Quoting Xing Zhengjun (2020-11-26 01:44:55)



On 11/25/2020 4:47 AM, Chris Wilson wrote:

Quoting Oliver Sang (2020-11-19 07:20:18)

On Fri, Nov 13, 2020 at 04:27:13PM +0200, Joonas Lahtinen wrote:

Hi,

Could you add intel-...@lists.freedesktop.org into reports going
forward.

Quoting kernel test robot (2020-11-11 17:58:11)


Greeting,

FYI, we noticed a -54.0% regression of 
phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second
 due to commit:


How many runs are there on the bad version to ensure the bisect is
repeatable?


test 4 times.
zxing@inn:/result/phoronix-test-suite/performance-true-Radial_Gradient_Paint-1024x1024-jxrendermark-1.2.4-ucode=0xd6-monitor=da39a3ee/lkp-cfl-d1/debian-x86_64-phoronix/x86_64-rhel-8.3/gcc-9/59dd13ad310793757e34afa489dd6fc8544fc3da$
 grep -r "operations_per_second" */stats.json
0/stats.json: 
"phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
 4133.487932,
1/stats.json: 
"phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
 4120.421503,
2/stats.json: 
"phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
 4188.414835,
3/stats.json: 
"phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
 4068.549514,


a w/o revert (drm-tip)
b w/ revert
+mB+
| ..b  |
| ..b.aa   |
| a.a  |
| a.a  |
|  b  b  a |
|   b  b  b b. a   |
|   b  bb bbb...   |
|b   ab bbab.bb.bba b a aab   a|
| |__A__|  |
| |MA_||
+--+
  NMin   MaxMedian   Avg
Stddev
a 120  3621.8761 7356.4442 4606.7895 4607.9132 156.17693
b 120  2664.0563 6359.9686 4519.5036 4534.4463 95.471121

The patch is not expected to have any impact on the machine you are testing on.
-Chris



What's your code base?
For my side:
1) sync the code to the head of Linux mainline
2) git reset --hard 59dd13ad31
3) git revert 59dd13ad3107
We compare the test result of commit 59dd13ad3107 (step 2) and
2052847b06f8 (step 3, revert 59dd13ad3107), the regression should
related with 59dd13ad3107. Each test case we run 5 times.


a 59dd13ad31
b revert
+mB+
|a |
|   aa |
| .bba |
| .bbaab   |
| .b . b   b   |
|a   b.. ..bb  bb  |
|  b a   b.b.a bb  |
|aa  b..aaa..b.b..bab   b a   .|
|  |__A__| |
|  |___A_| |
+--+
 NMin   MaxMedian   Avg
Stddev
a 120  3658.3435 6363.7812 4527.4406  4536.612 86.095459
b 120  3928.9643  6375.829 4576.0482 4585.4224  157.284



Could you share with me your test commands and the hardware info, then I 
can reproduce it on my side? Thanks.

--
Zhengjun Xing


Re: [Intel-gfx] [drm/i915/gem] 59dd13ad31: phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second -54.0% regression

2020-11-25 Thread Xing Zhengjun




On 11/25/2020 4:47 AM, Chris Wilson wrote:

Quoting Oliver Sang (2020-11-19 07:20:18)

On Fri, Nov 13, 2020 at 04:27:13PM +0200, Joonas Lahtinen wrote:

Hi,

Could you add intel-...@lists.freedesktop.org into reports going
forward.

Quoting kernel test robot (2020-11-11 17:58:11)


Greeting,

FYI, we noticed a -54.0% regression of 
phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second
 due to commit:


How many runs are there on the bad version to ensure the bisect is
repeatable?


test 4 times.
zxing@inn:/result/phoronix-test-suite/performance-true-Radial_Gradient_Paint-1024x1024-jxrendermark-1.2.4-ucode=0xd6-monitor=da39a3ee/lkp-cfl-d1/debian-x86_64-phoronix/x86_64-rhel-8.3/gcc-9/59dd13ad310793757e34afa489dd6fc8544fc3da$
 grep -r "operations_per_second" */stats.json
0/stats.json: 
"phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
 4133.487932,
1/stats.json: 
"phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
 4120.421503,
2/stats.json: 
"phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
 4188.414835,
3/stats.json: 
"phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second":
 4068.549514,


a w/o revert (drm-tip)
b w/ revert
+mB+
| ..b  |
| ..b.aa   |
| a.a  |
| a.a  |
|  b  b  a |
|   b  b  b b. a   |
|   b  bb bbb...   |
|b   ab bbab.bb.bba b a aab   a|
| |__A__|  |
| |MA_||
+--+
 NMin   MaxMedian   Avg
Stddev
a 120  3621.8761 7356.4442 4606.7895 4607.9132 156.17693
b 120  2664.0563 6359.9686 4519.5036 4534.4463 95.471121

The patch is not expected to have any impact on the machine you are testing on.
-Chris



What's your code base?
For my side:
1) sync the code to the head of Linux mainline
2) git reset --hard 59dd13ad31
3) git revert 59dd13ad3107
We compare the test result of commit 59dd13ad3107 (step 2) and 
2052847b06f8 (step 3, revert 59dd13ad3107), the regression should 
related with 59dd13ad3107. Each test case we run 5 times.

=
tbox_group/testcase/rootfs/kconfig/compiler/need_x/test/option_a/option_b/cpufreq_governor/ucode/debug-setup:

lkp-cfl-d1/phoronix-test-suite/debian-x86_64-phoronix/x86_64-rhel-8.3/gcc-9/true/jxrendermark-1.2.4/Radial 
Gradient Paint/1024x1024/performance/0xde/regression_test


commit:
  0dccdba51e852271a3dbc9358375f4c882b863f2
  59dd13ad310793757e34afa489dd6fc8544fc3da
  2052847b06f863a028f7f3bbc62401e043b34301 (revert 59dd13ad3107)

0dccdba51e852271 59dd13ad310793757e34afa489d 2052847b06f863a028f7f3bbc62
 --- ---
 %stddev %change %stddev %change %stddev
 \  |\  |\
  8145 ±  2% -53.1%   3817 ±  3%  -1.8%   7995 


phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second


--
Zhengjun Xing


Re: [drm/fb] 6a1b34c0a3: WARNING:at_drivers/gpu/drm/drm_fb_helper.c:#drm_fb_helper_damage_work

2020-11-23 Thread Xing Zhengjun




On 11/23/2020 4:04 PM, Thomas Zimmermann wrote:

Hi

Am 22.11.20 um 15:18 schrieb kernel test robot:


Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 6a1b34c0a339fdc75d7932ad5702f2177c9d7a1c ("drm/fb-helper: Move 
damage blit code and its setup into separate routine")
url: 
https://github.com/0day-ci/linux/commits/Thomas-Zimmermann/drm-fb-helper-Various-fixes-and-cleanups/20201120-182750 




in testcase: trinity
version: trinity-static-i386-x86_64-f93256fb_2019-08-28
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 
-m 8G


caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


That dmesg is full of messages like

[  696.323556] alloc_vmap_area: 24 callbacks suppressed
[  696.323562] vmap allocation for size 3149824 failed: use 
vmalloc= to increase size


I think the test system needs to be reconfigured first.



We have tried "vmalloc=256M" and "vmalloc=512M", the same warning still 
happened.




Best regards
Thomas




+---+++ 

|   | 
154f2d1afd | 6a1b34c0a3 |
+---+++ 

| 
WARNING:at_drivers/gpu/drm/drm_fb_helper.c:#drm_fb_helper_damage_work 
| 0  | 36 |
| 
EIP:drm_fb_helper_damage_work 
| 0  | 36 |
+---+++ 




If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


[  106.616652] WARNING: CPU: 1 PID: 173 at 
drivers/gpu/drm/drm_fb_helper.c:434 drm_fb_helper_damage_work+0x371/0x390

[  106.627732] Modules linked in:
[  106.632419] CPU: 1 PID: 173 Comm: kworker/1:2 Not tainted 
5.10.0-rc4-next-20201120-7-g6a1b34c0a339 #3
[  106.637806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.12.0-1 04/01/2014

[  106.642853] Workqueue: events drm_fb_helper_damage_work
[  106.647664] EIP: drm_fb_helper_damage_work+0x371/0x390
[  106.652305] Code: b1 17 c7 01 68 bd 5b 2d c5 53 50 68 55 21 2d c5 
83 15 44 b1 17 c7 00 e8 ae bc b1 01 83 05 48 b1 17 c7 01 83 15 4c b1 
17 c7 00 <0f> 0b 83 05 50 b1 17 c7 01 83 15 54 b1 17 c7 00 83 c4 10 e9 
78 fd

[  106.663517] EAX: 002d EBX: c8730520 ECX: 0847 EDX: 
[  106.668423] ESI: ca987000 EDI: cab274d8 EBP: f62f5f20 ESP: f62f5ee8
[  106.673214] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 
00010246

[  106.678295] CR0: 80050033 CR2:  CR3: 063a7000 CR4: 000406d0
[  106.683160] DR0:  DR1:  DR2:  DR3: 
[  106.687967] DR6: fffe0ff0 DR7: 0400
[  106.690763] Call Trace:
[  106.693394]  process_one_work+0x3ea/0xaa0
[  106.693501] ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual 
Function Network Driver

[  106.695300]  worker_thread+0x330/0x900
[  106.697406] ixgbevf: Copyright (c) 2009 - 2018 Intel Corporation.
[  106.702963]  kthread+0x190/0x210
[  106.705709]  ? rescuer_thread+0x650/0x650
[  106.708379]  ? kthread_insert_work_sanity_check+0x120/0x120
[  106.711271]  ret_from_fork+0x1c/0x30
[  106.713973] ---[ end trace dd528799d3369ac1 ]---


To reproduce:

 # build kernel
cd linux
cp config-5.10.0-rc4-next-20201120-7-g6a1b34c0a339 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 olddefconfig prepare 
modules_prepare bzImage


 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp qemu -k  job-script # job-script is attached 
in this email




Thanks,
Oliver Sang




___
LKP mailing list -- l...@lists.01.org
To unsubscribe send an email to lkp-le...@lists.01.org



--
Zhengjun Xing


Re: [LKP] Re: [mm] be5d0a74c6: will-it-scale.per_thread_ops -9.1% regression

2020-11-17 Thread Xing Zhengjun




On 11/17/2020 12:19 AM, Johannes Weiner wrote:

On Sun, Nov 15, 2020 at 05:55:44PM +0800, kernel test robot wrote:


Greeting,

FYI, we noticed a -9.1% regression of will-it-scale.per_thread_ops due to 
commit:


commit: be5d0a74c62d8da43f9526a5b08cdd18e2bbc37a ("mm: memcontrol: switch to native 
NR_ANON_MAPPED counter")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 
192G memory
with following parameters:

nr_task: 50%
mode: thread
test: page_fault2
cpufreq_governor: performance
ucode: 0x5002f01


I suspect it's the lock_page_memcg() in page_remove_rmap(). We already
needed it for shared mappings, and this patch added it to private path
as well, which this test exercises.

The slowpath for this lock is extremely cold - most of the time it's
just an rcu_read_lock(). But we're still doing the function call.

Could you try if this patch helps, please?


I apply the patch to Linux mainline v5.10-rc4, Linux-next next-20201117, 
and "be5d0a74c6", they are all failed. What's your codebase for

the patch? I appreciate it if you can rebase the patch to "be5d0a74c6".
From "be5d0a74c6" to v5.10-rc4 or next-20201117, there are a lot of 
commits, they will affect the test result. Thanks.




 From f6e8e56b369109d1362de2c27ea6601d5c411b2e Mon Sep 17 00:00:00 2001
From: Johannes Weiner 
Date: Mon, 16 Nov 2020 10:48:06 -0500
Subject: [PATCH] lockpagememcg

---
  include/linux/memcontrol.h | 61 ++--
  mm/memcontrol.c| 82 +++---
  2 files changed, 73 insertions(+), 70 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 20108e426f84..b4b73e375948 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -842,9 +842,64 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
  extern bool cgroup_memory_noswap;
  #endif
  
-struct mem_cgroup *lock_page_memcg(struct page *page);

-void __unlock_page_memcg(struct mem_cgroup *memcg);
-void unlock_page_memcg(struct page *page);
+struct mem_cgroup *lock_page_memcg_slowpath(struct page *page,
+   struct mem_cgroup *memcg);
+void unlock_page_memcg_slowpath(struct mem_cgroup *memcg);
+
+/**
+ * lock_page_memcg - lock a page and memcg binding
+ * @page: the page
+ *
+ * This function protects unlocked LRU pages from being moved to
+ * another cgroup.
+ *
+ * It ensures lifetime of the memcg -- the caller is responsible for
+ * the lifetime of the page; __unlock_page_memcg() is available when
+ * @page might get freed inside the locked section.
+ */
+static inline struct mem_cgroup *lock_page_memcg(struct page *page)
+{
+   struct page *head = compound_head(page); /* rmap on tail pages */
+   struct mem_cgroup *memcg;
+
+   /*
+* The RCU lock is held throughout the transaction.  The fast
+* path can get away without acquiring the memcg->move_lock
+* because page moving starts with an RCU grace period.
+*
+* The RCU lock also protects the memcg from being freed when
+* the page state that is going to change is the only thing
+* preventing the page itself from being freed. E.g. writeback
+* doesn't hold a page reference and relies on PG_writeback to
+* keep off truncation, migration and so forth.
+ */
+   rcu_read_lock();
+
+   if (mem_cgroup_disabled())
+   return NULL;
+
+   memcg = page_memcg(head);
+   if (unlikely(!memcg))
+   return NULL;
+
+   if (likely(!atomic_read(>moving_account)))
+   return memcg;
+
+   return lock_page_memcg_slowpath(head, memcg);
+}
+
+static inline void __unlock_page_memcg(struct mem_cgroup *memcg)
+{
+   if (unlikely(memcg && memcg->move_lock_task == current))
+   unlock_page_memcg_slowpath(memcg);
+
+   rcu_read_unlock();
+}
+
+static inline void unlock_page_memcg(struct page *page)
+{
+   __unlock_page_memcg(page_memcg(compound_head(page)));
+}
  
  /*

   * idx can be of type enum memcg_stat_item or node_stat_item.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 69a2893a6455..9acc42388b86 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2084,49 +2084,19 @@ void mem_cgroup_print_oom_group(struct mem_cgroup 
*memcg)
pr_cont(" are going to be killed due to memory.oom.group set\n");
  }
  
-/**

- * lock_page_memcg - lock a page and memcg binding
- * @page: the page
- *
- * This function protects unlocked LRU pages from being moved to
- * another cgroup.
- *
- * It ensures lifetime of the returned memcg. Caller is responsible
- * for the lifetime of the page; __unlock_page_memcg() is available
- * when @page might get freed inside the locked section.
- */
-struct mem_cgroup *lock_page_memcg(struct page *page)
+struct 

Re: [LKP] Re: [mm] e6e88712e4: stress-ng.tmpfs.ops_per_sec -69.7% regression

2020-11-09 Thread Xing Zhengjun




On 11/7/2020 4:55 AM, Matthew Wilcox wrote:

On Mon, Nov 02, 2020 at 01:21:39PM +0800, Rong Chen wrote:

we compared the tmpfs.ops_per_sec: (363 / 103.02) between this commit and
parent commit.


Thanks!  I see about a 50% hit on my system, and this patch restores the
performance.  Can you verify this works for you?

diff --git a/mm/madvise.c b/mm/madvise.c
index 9b065d412e5f..e602333f8c0d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -225,7 +225,7 @@ static void force_shm_swapin_readahead(struct 
vm_area_struct *vma,
struct address_space *mapping)
  {
XA_STATE(xas, >i_pages, linear_page_index(vma, start));
-   pgoff_t end_index = end / PAGE_SIZE;
+   pgoff_t end_index = linear_page_index(vma, end + PAGE_SIZE - 1);
struct page *page;
  
  	rcu_read_lock();

___
LKP mailing list -- l...@lists.01.org
To unsubscribe send an email to lkp-le...@lists.01.org


I test the patch, the regression is disappeared.

=
tbox_group/testcase/rootfs/kconfig/compiler/nr_threads/disk/testtime/class/cpufreq_governor/ucode:

lkp-csl-2sp3/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/100%/1HDD/100s/memory/performance/0x42c

commit:
  f5df8635c5a3c912919c91be64aa198554b0f9ed
  e6e88712e43b7942df451508aafc2f083266f56b
  6bc25f0c5e0d55145f7ef087adea2693802a80f3 (this test patch)

f5df8635c5a3c912 e6e88712e43b7942df451508aaf 6bc25f0c5e0d55145f7ef087ade
 --- ---
 %stddev %change %stddev %change %stddev
 \  |\  |\
  1198 ±  4% -69.7% 362.67+3.3%   1238 ± 
3%  stress-ng.tmpfs.ops
 11.62 ±  4% -69.7%   3.52+3.4%  12.02 ± 
3%  stress-ng.tmpfs.ops_per_sec




--
Zhengjun Xing


Re: [LKP] Re: [mm/gup] a308c71bf1: stress-ng.vm-splice.ops_per_sec -95.6% regression

2020-11-05 Thread Xing Zhengjun




On 11/6/2020 2:37 AM, Linus Torvalds wrote:

On Thu, Nov 5, 2020 at 12:29 AM Xing Zhengjun
 wrote:



Rong - mind testing this? I don't think the zero-page _should_ be
something that real loads care about, but hey, maybe people do want to
do things like splice zeroes very efficiently..


I test the patch, the regression still existed.


Thanks.

So Jann's suspicion seems interesting but apparently not the reason
for this particular case.

For being such a _huge_ difference (20x improvement followed by a 20x
regression), it's surprising how little the numbers give a clue. The
big changes are things like
"interrupts.CPU19.CAL:Function_call_interrupts", but while those
change by hundreds of percent, most of the changes seem to just be
about them moving to different CPU's. IOW, we have things like

   5652 ± 59%+387.9%  27579 ± 96%
interrupts.CPU13.CAL:Function_call_interrupts
  28249 ± 32% -69.3%   8675 ± 50%
interrupts.CPU28.CAL:Function_call_interrupts

which isn't really much of a change at all despite the changes looking
very big - it's just the stats jumping from one CPU to another.

Maybe there's some actual change in there, but it's very well hidden if so.

Yes, some of the numbers get worse:

 868396 ±  3% +20.9%1050234 ± 14%
interrupts.RES:Rescheduling_interrupts

so that's a 20% increase in rescheduling interrupts,  But it's a 20%
increase, not a 500% one. So the fact that performance changes by 20x
is still very unclear to me.

We do have a lot of those numa-meminfo changes, but they could just
come from allocation patterns.

That said - another difference between the fast-cup code and the
regular gup code is that the fast-gup code does

 if (pte_protnone(pte))
 goto pte_unmap;

and the regular slow case does

 if ((flags & FOLL_NUMA) && pte_protnone(pte))
 goto no_page;

now, FOLL_NUMA is always set in the slow case if we don't have
FOLL_FORCE set, so this difference isn't "real", but it's one of those
cases where the zero-page might be marked for NUMA faulting, and doing
the forced COW might then cause it to be accessible.

Just out of curiosity, do the numbers change enormously if you just remove that

 if (pte_protnone(pte))
 goto pte_unmap;

test from the fast-cup case (top of the loop in gup_pte_range()) -
effectively making fast-gup basically act like FOLL_FORCE wrt numa
placement..


Based on the last debug patch, I removed the two lines code at the top 
of the loop in gup_pte_range() as you mentioned, the regression still 
existed.


=
tbox_group/testcase/rootfs/kconfig/compiler/nr_threads/disk/testtime/class/cpufreq_governor/ucode:

lkp-csl-2sp5/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/100%/1HDD/30s/pipe/performance/0x5002f01

commit:
  1a0cf26323c80e2f1c58fc04f15686de61bfab0c
  a308c71bf1e6e19cc2e4ced31853ee0fc7cb439a
  da5ba9980aa2211c1e2a89fc814abab2fea6f69d (last debug patch)
  8803d304738b52f66f6b683be38c4f8b9cf4bff5 (to debug the odd 
performance numbers)


1a0cf26323c80e2f a308c71bf1e6e19cc2e4ced3185 da5ba9980aa2211c1e2a89fc814 
8803d304738b52f66f6b683be38
 --- --- 
---
 %stddev %change %stddev %change 
%stddev %change %stddev
 \  |\  |\ 
 |\
 3.406e+09   -95.6%   1.49e+08   -96.4%  1.213e+08 
  -96.5%  1.201e+08stress-ng.vm-splice.ops
 1.135e+08   -95.6%4965911   -96.4%4041777 
  -96.5%4002572stress-ng.vm-splice.ops_per_sec





I'm not convinced that's a valid change in general, so this is just a
"to debug the odd performance numbers" issue.

Also out of curiosity: is the performance profile limited to just the
load, or is it a system profile (ie do you have "-a" on the perf
record line or not).



In our test , "-a" is enabled on the perf record line.


Linus



--
Zhengjun Xing


Re: [LKP] Re: [mm/gup] a308c71bf1: stress-ng.vm-splice.ops_per_sec -95.6% regression

2020-11-05 Thread Xing Zhengjun




On 11/5/2020 2:29 AM, Linus Torvalds wrote:

On Mon, Nov 2, 2020 at 1:15 AM kernel test robot  wrote:


Greeting,

FYI, we noticed a -95.6% regression of stress-ng.vm-splice.ops_per_sec due to 
commit:

commit: a308c71bf1e6e19cc2e4ced31853ee0fc7cb439a ("mm/gup: Remove enfornced COW 
mechanism")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


Note that this is just the reverse of the previous 2000% improvement
reported by the test robot here:

 https://lore.kernel.org/lkml/20200611040453.GK12456@shao2-debian/

and the explanation seems to remain the same:

 
https://lore.kernel.org/lkml/cag48ez1v1b4x5lgfya6nvi33-twwqna_dc5jgfvosqqhdn_...@mail.gmail.com/

IOW, this is testing a special case (zero page lookup) that the "force
COW" patches happened to turn into a regular case (COW creating a
regular page from the zero page).

The question is whether we should care about the zero page for gup_fast lookup.

If we do care, then the proper fix is likely simply to allow the zero
page in fast-gup, the same way we already do in slow-gup.

ENTIRELY UNTESTED PATCH ATTACHED.

Rong - mind testing this? I don't think the zero-page _should_ be
something that real loads care about, but hey, maybe people do want to
do things like splice zeroes very efficiently..


I test the patch, the regression still existed.

=
tbox_group/testcase/rootfs/kconfig/compiler/nr_threads/disk/testtime/class/cpufreq_governor/ucode:

lkp-csl-2sp5/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/100%/1HDD/30s/pipe/performance/0x5002f01

commit:
  1a0cf26323c80e2f1c58fc04f15686de61bfab0c
  a308c71bf1e6e19cc2e4ced31853ee0fc7cb439a
  da5ba9980aa2211c1e2a89fc814abab2fea6f69d (debug patch)

1a0cf26323c80e2f a308c71bf1e6e19cc2e4ced3185 da5ba9980aa2211c1e2a89fc814
 --- ---
 %stddev %change %stddev %change %stddev
 \  |\  |\
 3.406e+09   -95.6%   1.49e+08   -96.4%  1.213e+08 
   stress-ng.vm-splice.ops
 1.135e+08   -95.6%4965911   -96.4%4041777 
   stress-ng.vm-splice.ops_per_sec




And note the "untested" part of the patch. It _looks_ fairly obvious,
but maybe I'm missing something.

 Linus


___
LKP mailing list -- l...@lists.01.org
To unsubscribe send an email to lkp-le...@lists.01.org



--
Zhengjun Xing


Re: [LKP] Re: [mm/memcg] bd0b230fe1: will-it-scale.per_process_ops -22.7% regression

2020-11-03 Thread Xing Zhengjun




On 11/2/2020 6:02 PM, Michal Hocko wrote:

On Mon 02-11-20 17:53:14, Rong Chen wrote:



On 11/2/20 5:27 PM, Michal Hocko wrote:

On Mon 02-11-20 17:15:43, kernel test robot wrote:

Greeting,

FYI, we noticed a -22.7% regression of will-it-scale.per_process_ops due to 
commit:


commit: bd0b230fe14554bfffbae54e19038716f96f5a41 ("mm/memcg: unify swap and memsw 
page counters")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

I really fail to see how this can be anything else than a data structure
layout change. There is one counter less.

btw. are cgroups configured at all? What would be the configuration?


Hi Michal,

We used the default configure of cgroups, not sure what configuration you
want,
could you give me more details? and here is the cgroup info of will-it-scale
process:

$ cat /proc/3042/cgroup
12:hugetlb:/
11:memory:/system.slice/lkp-bootstrap.service


OK, this means that memory controler is enabled and in use. Btw. do you
get the original performance if you add one phony page_counter after the
union?

I add one phony page_counter after the union and re-test, the regression 
reduced to -1.2%. It looks like the regression caused by the data 
structure layout change.


=
tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode/debug-setup:

lkp-hsw-4ex1/will-it-scale/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/50%/process/page_fault2/performance/0x16/test1

commit:
  8d387a5f172f26ff8c76096d5876b881dec6b7ce
  bd0b230fe14554bfffbae54e19038716f96f5a41
  b3233916ab0a883e1117397e28b723bd0e4ac1eb (debug patch add one phony 
page_counter after the union)


8d387a5f172f26ff bd0b230fe14554bfffbae54e190 b3233916ab0a883e1117397e28b
 --- ---
 %stddev %change %stddev %change %stddev
 \  |\  |\
187632   -22.8% 144931-1.2% 185391 
   will-it-scale.per_process_ops
  13509525   -22.8%   10435073-1.2%   13348181 
   will-it-scale.workload




--
Zhengjun Xing


Re: [LKP] Re: [btrfs] c75e839414: aim7.jobs-per-min -9.1% regression

2020-11-02 Thread Xing Zhengjun

Hi Josef,

 I re-test it in v5.10-rc2, the regression still existed. Do you 
have time to take a look at this? Thanks.


On 10/13/2020 2:30 PM, Xing Zhengjun wrote:

Hi Josef,

    I re-test in v5.9, the regression still existed. Do you have time to 
take a look at this? Thanks.


On 6/15/2020 11:21 AM, Xing Zhengjun wrote:

Hi Josef,

    Do you have time to take a look at this? Thanks.

On 6/12/2020 2:11 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a -9.1% regression of aim7.jobs-per-min due to commit:


commit: c75e839414d3610e6487ae3145199c500d55f7f7 ("btrfs: kill the 
subvol_srcu")

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: aim7
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 
2.30GHz with 192G memory

with following parameters:

disk: 4BRD_12G
md: RAID0
fs: btrfs
test: disk_wrt
load: 1500
cpufreq_governor: performance
ucode: 0x52c

test-description: AIM7 is a traditional UNIX system level benchmark 
suite which is used to test and measure the performance of multiuser 
system.

test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
--> 




To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

= 

compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase/ucode: 

gcc-7/performance/4BRD_12G/btrfs/x86_64-rhel-7.6/1500/RAID0/debian-x86_64-20191114.cgz/lkp-csl-2ap2/disk_wrt/aim7/0x52c 



commit:
   efc3453494 ("btrfs: make btrfs_cleanup_fs_roots use the radix tree 
lock")

   c75e839414 ("btrfs: kill the subvol_srcu")

efc3453494af7818 c75e839414d3610e6487ae31451
 ---
    fail:runs  %reproduction    fail:runs
    | | |
   3:9  -33%    :8 
dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x

  %stddev %change %stddev
  \  |    \
  29509 ±  2%  -9.1%  26837 ±  2%  aim7.jobs-per-min
 305.28 ±  2% +10.0% 335.72 ±  2%  aim7.time.elapsed_time
 305.28 ±  2% +10.0% 335.72 ±  2%  
aim7.time.elapsed_time.max
    4883135 ± 10% +37.9%    6735464 ±  7% 
aim7.time.involuntary_context_switches

  56288 ±  2% +10.5%  62202 ±  2%  aim7.time.system_time
    2344783    +6.5%    2497364 ±  2% 
aim7.time.voluntary_context_switches

   62337721 ±  2%  +9.8%   68456490 ±  2%  turbostat.IRQ
 431.56 ±  6% +22.3% 527.88 ±  4%  vmstat.procs.r
  27340 ±  2% +11.2%  30397 ±  2%  vmstat.system.cs
 226804 ±  6% +21.7% 276057 ±  4%  meminfo.Active(file)
 221309 ±  6% +22.3% 270668 ±  4%  meminfo.Dirty
 720.89 ±111% +49.3%   1076 ± 73%  meminfo.Mlocked
  14278 ±  2%  -8.3%  13094 ±  2%  meminfo.max_used_kB
  57228 ±  6% +22.7%  70195 ±  5% 
numa-meminfo.node0.Active(file)

  55433 ±  6% +21.6%  67431 ±  4%  numa-meminfo.node0.Dirty
  56152 ±  6% +21.4%  68180 ±  5% 
numa-meminfo.node1.Active(file)

  55001 ±  6% +22.5%  67397 ±  4%  numa-meminfo.node1.Dirty
  56373 ±  6% +21.7%  68594 ±  4% 
numa-meminfo.node2.Active(file)

  55222 ±  7% +22.6%  67726 ±  4%  numa-meminfo.node2.Dirty
  56671 ±  6% +20.5%  68317 ±  3% 
numa-meminfo.node3.Active(file)

  55285 ±  6% +21.8%  67355 ±  4%  numa-meminfo.node3.Dirty
  56694 ±  6% +21.7%  69019 ±  4%  
proc-vmstat.nr_active_file

  55342 ±  6% +22.3%  67662 ±  4%  proc-vmstat.nr_dirty
 402316    +2.1% 410951    proc-vmstat.nr_file_pages
 180.22 ±111% +49.4% 269.25 ± 73%  proc-vmstat.nr_mlock
  56694 ±  6% +21.7%  69019 ±  4% 
proc-vmstat.nr_zone_active_file
  54680 ±  6% +22.8%  67168 ±  4% 
proc-vmstat.nr_zone_write_pending

    3144381 ±  2%  +6.1%    3335275    proc-vmstat.pgactivate
    1387558 ±  2%  +7.9%    1496754 ±  2%  proc-vmstat.pgfault
 983.33 ±  4%  +5.4%   1036 
proc-vmstat.unevictable_pgs_culled
  14331 ±  6% +22.6%  17566 ±  5% 
numa-vmstat.node0.nr_active_file
  13884 ±  6% +21.6%  16884 ±  4%  
numa-vmstat.node0.nr_dirty
  14330 ±  6% +22.6%  17566 ±  5% 
numa-vmstat.node0.nr_zone_active_file
  13714 ±  6% +22.2%  16755 ±  4% 
numa-vmstat.node0.nr_zone_write_pending
  14047 ±  6% +21.3%  17043 ±  4% 

Re: [LKP] Re: [sched] bdfcae1140: will-it-scale.per_thread_ops -37.0% regression

2020-10-22 Thread Xing Zhengjun




On 10/22/2020 9:19 PM, Mathieu Desnoyers wrote:

- On Oct 21, 2020, at 9:54 PM, Xing Zhengjun zhengjun.x...@linux.intel.com 
wrote:
[...]

In fact, 0-day just copy the will-it-scale benchmark from the GitHub, if
you think the will-it-scale benchmark has some issues, you can
contribute your idea and help to improve it, later we will update the
will-it-scale benchmark to the new version.


This is why I CC'd the maintainer of the will-it-scale github project, Anton 
Blanchard.
My main intent is to report this issue to him, but I have not heard back from 
him yet.
Is this project maintained ? Let me try to add his ozlabs.org address in CC.


For this test case, if we bind the workload to a specific CPU, then it
will hide the scheduler balance issue. In the real world, we seldom bind
the CPU...


When you say that you bind the workload to a specific CPU, is that done
outside of the will-it-scale testsuite, thus limiting the entire testsuite
to a single CPU, or you expect that internally the will-it-scale context-switch1
test gets affined to a single specific CPU/core/hardware thread through use of
hwloc ?


The later one.


Thanks,

Mathieu



--
Zhengjun Xing


Re: [LKP] Re: [sched] bdfcae1140: will-it-scale.per_thread_ops -37.0% regression

2020-10-21 Thread Xing Zhengjun




On 10/20/2020 9:14 PM, Mathieu Desnoyers wrote:

- On Oct 19, 2020, at 11:24 PM, Xing Zhengjun zhengjun.x...@linux.intel.com 
wrote:


On 10/7/2020 10:50 PM, Mathieu Desnoyers wrote:

- On Oct 2, 2020, at 4:33 AM, Rong Chen rong.a.c...@intel.com wrote:


Greeting,

FYI, we noticed a -37.0% regression of will-it-scale.per_thread_ops due to
commit:


commit: bdfcae11403e5099769a7c8dc3262e3c4193edef ("[RFC PATCH 2/3] sched:
membarrier: cover kthread_use_mm (v3)")
url:
https://github.com/0day-ci/linux/commits/Mathieu-Desnoyers/Membarrier-updates/20200925-012549
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git
848785df48835eefebe0c4eb5da7690690b0a8b7

in testcase: will-it-scale
on test machine: 104 threads Skylake with 192G memory
with following parameters:

nr_task: 50%
mode: thread
test: context_switch1
cpufreq_governor: performance
ucode: 0x2006906

test-description: Will It Scale takes a testcase and runs it from 1 through to n
parallel copies to see if the testcase will scale. It builds both a process and
threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



Hi,

I would like to report what I suspect is a random thread placement issue in the
context_switch1 test used by the 0day bot when running on a machine with
hyperthread
enabled.

AFAIU the test code uses hwloc for thread placement which should theoretically
ensure
that each thread is placed on same processing unit, core and numa node between
runs.

We can find the test code here:

https://github.com/antonblanchard/will-it-scale/blob/master/tests/context_switch1.c

And the main file containing thread setup is here:

https://github.com/antonblanchard/will-it-scale/blob/master/main.c

AFAIU, the test is started without the "-m" switch, which therefore affinitizes
tasks on cores rather than on processing units (SMT threads).

When testcase() creates the child thread with new_task(), it basically issues:

pthread_create([nr_threads++], NULL, func, arg);

passing a NULL pthread_attr_t, and not executing any pre_trampoline on the
child.
The pre_trampoline would have issued hwloc_set_thread_cpubind if it were
executed on
the child, but it's not. Therefore, we expect the cpu affinity mask of the
parent to
be copied on clone and used by the child.

A quick test on a machine with hyperthreading enabled shows that the cpu
affinity mask
for the parent and child has two bits set:

taskset -p 1868607
pid 1868607's current affinity mask: 10001
taskset -p 1868606
pid 1868606's current affinity mask: 10001

So AFAIU the placement of the parent and child will be random on either the same
processing unit, or on separate processing units within the same core.

I suspect this randomness can significantly affect the performance number
between
runs, and trigger unwarranted performance regression warnings.

Thanks,

Mathieu


Yes, the randomness may happen in some special cases.  But in 0-day, we
test multi times (>=3), the report is the average number.
For this case, we test 4 times, it is stable, the wave is ±  2%.
So I don't think the -37.0% regression is caused by the randomness.

0/stats.json:  "will-it-scale.per_thread_ops": 105228,
1/stats.json:  "will-it-scale.per_thread_ops": 100443,
2/stats.json:  "will-it-scale.per_thread_ops": 98786,
3/stats.json:  "will-it-scale.per_thread_ops": 102821,

c2daff748f0ea954 bdfcae11403e5099769a7c8dc32
 ---
  %stddev %change %stddev
  \  |\
 161714 ±  2% -37.0% 101819 ±  2%  will-it-scale.per_thread_ops


Arguing whether this specific instance of the test is indeed a performance
regression or not is not relevant to this discussion.

What I am pointing out here is that the test needs fixing because it generates
noise due to a random thread placement configuration. This issue is about 
whether
we can trust the results of those tests as kernel maintainers.

So on one hand, you can fix the test. This is simple to do: make sure the thread
affinity does not allow for this randomness on SMT.

But you seem to argue that the test does not need to be fixed, because the 0day
infrastructure in which it runs will cover for this randomness. I really doubt
about this.

If you indeed choose to argue that the test does not need fixing, then here is 
the
statistical analysis I am looking for:

- With the 4 runs, what are the odds that the average result for one class 
significantly
   differs from the other class due to this randomness. It may be small, but it 
is certainly
   not zero,


If 4 runs are not enough, how many times' run do you think is OK? In 
fact, I have re-test it for more than 10 times, the test result is 
almost the same.


Re: [LKP] Re: Unreliable will-it-scale context_switch1 test on 0day bot

2020-10-19 Thread Xing Zhengjun




On 10/19/2020 11:24 PM, Philip Li wrote:

On Mon, Oct 19, 2020 at 09:27:32AM -0400, Mathieu Desnoyers wrote:

Hi,

I pointed out an issue with the will-it-scale context_switch1 test run by the 
0day bot on
October 7, 2020, and got no reply.

Thanks Mathieu for the feedback, we had added it to the TODO list but sorry for
not reply in time.

Zhengjun, can you help follow up this mail thread?



I have replied in the origin mail.



Until this issue is solved, the results of those tests are basically pure noise 
when run on
SMT hardware:

https://lore.kernel.org/lkml/1183082664.11002.1602082242482.javamail.zim...@efficios.com/

Who is maintaining those tests and the 0day bot ?

will-it-scale itself is from community at 
https://github.com/antonblanchard/will-it-scale
and we will look for the support if we don't have quick solution. 0day bot 
basically wraps
the test and analyze the result to find which commit leads to change.



Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
LKP mailing list -- l...@lists.01.org
To unsubscribe send an email to lkp-le...@lists.01.org

___
LKP mailing list -- l...@lists.01.org
To unsubscribe send an email to lkp-le...@lists.01.org



--
Zhengjun Xing


Re: [LKP] Re: [sched] bdfcae1140: will-it-scale.per_thread_ops -37.0% regression

2020-10-19 Thread Xing Zhengjun




On 10/7/2020 10:50 PM, Mathieu Desnoyers wrote:

- On Oct 2, 2020, at 4:33 AM, Rong Chen rong.a.c...@intel.com wrote:


Greeting,

FYI, we noticed a -37.0% regression of will-it-scale.per_thread_ops due to
commit:


commit: bdfcae11403e5099769a7c8dc3262e3c4193edef ("[RFC PATCH 2/3] sched:
membarrier: cover kthread_use_mm (v3)")
url:
https://github.com/0day-ci/linux/commits/Mathieu-Desnoyers/Membarrier-updates/20200925-012549
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git
848785df48835eefebe0c4eb5da7690690b0a8b7

in testcase: will-it-scale
on test machine: 104 threads Skylake with 192G memory
with following parameters:

nr_task: 50%
mode: thread
test: context_switch1
cpufreq_governor: performance
ucode: 0x2006906

test-description: Will It Scale takes a testcase and runs it from 1 through to n
parallel copies to see if the testcase will scale. It builds both a process and
threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



Hi,

I would like to report what I suspect is a random thread placement issue in the
context_switch1 test used by the 0day bot when running on a machine with 
hyperthread
enabled.

AFAIU the test code uses hwloc for thread placement which should theoretically 
ensure
that each thread is placed on same processing unit, core and numa node between 
runs.

We can find the test code here:

https://github.com/antonblanchard/will-it-scale/blob/master/tests/context_switch1.c

And the main file containing thread setup is here:

https://github.com/antonblanchard/will-it-scale/blob/master/main.c

AFAIU, the test is started without the "-m" switch, which therefore affinitizes
tasks on cores rather than on processing units (SMT threads).

When testcase() creates the child thread with new_task(), it basically issues:

   pthread_create([nr_threads++], NULL, func, arg);

passing a NULL pthread_attr_t, and not executing any pre_trampoline on the 
child.
The pre_trampoline would have issued hwloc_set_thread_cpubind if it were 
executed on
the child, but it's not. Therefore, we expect the cpu affinity mask of the 
parent to
be copied on clone and used by the child.

A quick test on a machine with hyperthreading enabled shows that the cpu 
affinity mask
for the parent and child has two bits set:

taskset -p 1868607
pid 1868607's current affinity mask: 10001
taskset -p 1868606
pid 1868606's current affinity mask: 10001

So AFAIU the placement of the parent and child will be random on either the same
processing unit, or on separate processing units within the same core.

I suspect this randomness can significantly affect the performance number 
between
runs, and trigger unwarranted performance regression warnings.

Thanks,

Mathieu

Yes, the randomness may happen in some special cases.  But in 0-day, we 
test multi times (>=3), the report is the average number.

For this case, we test 4 times, it is stable, the wave is ±  2%.
So I don't think the -37.0% regression is caused by the randomness.

0/stats.json:  "will-it-scale.per_thread_ops": 105228,
1/stats.json:  "will-it-scale.per_thread_ops": 100443,
2/stats.json:  "will-it-scale.per_thread_ops": 98786,
3/stats.json:  "will-it-scale.per_thread_ops": 102821,

c2daff748f0ea954 bdfcae11403e5099769a7c8dc32
 ---
 %stddev %change %stddev
 \  |\
161714 ±  2% -37.0% 101819 ±  2%  will-it-scale.per_thread_ops


--
Zhengjun Xing


Re: [LKP] Re: [btrfs] c75e839414: aim7.jobs-per-min -9.1% regression

2020-10-13 Thread Xing Zhengjun

Hi Josef,

   I re-test in v5.9, the regression still existed. Do you have time to 
take a look at this? Thanks.


On 6/15/2020 11:21 AM, Xing Zhengjun wrote:

Hi Josef,

    Do you have time to take a look at this? Thanks.

On 6/12/2020 2:11 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a -9.1% regression of aim7.jobs-per-min due to commit:


commit: c75e839414d3610e6487ae3145199c500d55f7f7 ("btrfs: kill the 
subvol_srcu")

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: aim7
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 
2.30GHz with 192G memory

with following parameters:

disk: 4BRD_12G
md: RAID0
fs: btrfs
test: disk_wrt
load: 1500
cpufreq_governor: performance
ucode: 0x52c

test-description: AIM7 is a traditional UNIX system level benchmark 
suite which is used to test and measure the performance of multiuser 
system.

test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
--> 




To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

= 

compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase/ucode: 

   
gcc-7/performance/4BRD_12G/btrfs/x86_64-rhel-7.6/1500/RAID0/debian-x86_64-20191114.cgz/lkp-csl-2ap2/disk_wrt/aim7/0x52c 



commit:
   efc3453494 ("btrfs: make btrfs_cleanup_fs_roots use the radix tree 
lock")

   c75e839414 ("btrfs: kill the subvol_srcu")

efc3453494af7818 c75e839414d3610e6487ae31451
 ---
    fail:runs  %reproduction    fail:runs
    | | |
   3:9  -33%    :8 
dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x

  %stddev %change %stddev
  \  |    \
  29509 ±  2%  -9.1%  26837 ±  2%  aim7.jobs-per-min
 305.28 ±  2% +10.0% 335.72 ±  2%  aim7.time.elapsed_time
 305.28 ±  2% +10.0% 335.72 ±  2%  aim7.time.elapsed_time.max
    4883135 ± 10% +37.9%    6735464 ±  7%  
aim7.time.involuntary_context_switches

  56288 ±  2% +10.5%  62202 ±  2%  aim7.time.system_time
    2344783    +6.5%    2497364 ±  2%  
aim7.time.voluntary_context_switches

   62337721 ±  2%  +9.8%   68456490 ±  2%  turbostat.IRQ
 431.56 ±  6% +22.3% 527.88 ±  4%  vmstat.procs.r
  27340 ±  2% +11.2%  30397 ±  2%  vmstat.system.cs
 226804 ±  6% +21.7% 276057 ±  4%  meminfo.Active(file)
 221309 ±  6% +22.3% 270668 ±  4%  meminfo.Dirty
 720.89 ±111% +49.3%   1076 ± 73%  meminfo.Mlocked
  14278 ±  2%  -8.3%  13094 ±  2%  meminfo.max_used_kB
  57228 ±  6% +22.7%  70195 ±  5%  
numa-meminfo.node0.Active(file)

  55433 ±  6% +21.6%  67431 ±  4%  numa-meminfo.node0.Dirty
  56152 ±  6% +21.4%  68180 ±  5%  
numa-meminfo.node1.Active(file)

  55001 ±  6% +22.5%  67397 ±  4%  numa-meminfo.node1.Dirty
  56373 ±  6% +21.7%  68594 ±  4%  
numa-meminfo.node2.Active(file)

  55222 ±  7% +22.6%  67726 ±  4%  numa-meminfo.node2.Dirty
  56671 ±  6% +20.5%  68317 ±  3%  
numa-meminfo.node3.Active(file)

  55285 ±  6% +21.8%  67355 ±  4%  numa-meminfo.node3.Dirty
  56694 ±  6% +21.7%  69019 ±  4%  proc-vmstat.nr_active_file
  55342 ±  6% +22.3%  67662 ±  4%  proc-vmstat.nr_dirty
 402316    +2.1% 410951    proc-vmstat.nr_file_pages
 180.22 ±111% +49.4% 269.25 ± 73%  proc-vmstat.nr_mlock
  56694 ±  6% +21.7%  69019 ±  4%  
proc-vmstat.nr_zone_active_file
  54680 ±  6% +22.8%  67168 ±  4%  
proc-vmstat.nr_zone_write_pending

    3144381 ±  2%  +6.1%    3335275    proc-vmstat.pgactivate
    1387558 ±  2%  +7.9%    1496754 ±  2%  proc-vmstat.pgfault
 983.33 ±  4%  +5.4%   1036
proc-vmstat.unevictable_pgs_culled
  14331 ±  6% +22.6%  17566 ±  5%  
numa-vmstat.node0.nr_active_file

  13884 ±  6% +21.6%  16884 ±  4%  numa-vmstat.node0.nr_dirty
  14330 ±  6% +22.6%  17566 ±  5%  
numa-vmstat.node0.nr_zone_active_file
  13714 ±  6% +22.2%  16755 ±  4%  
numa-vmstat.node0.nr_zone_write_pending
  14047 ±  6% +21.3%  17043 ±  4%  
numa-vmstat.node1.nr_active_file

  13763 ±  6% +22.3%  16838 ±  4%  numa-vmstat.node1.nr_dirty
  14047 ±  6% +21.3%

Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

2020-10-13 Thread Xing Zhengjun




On 10/13/2020 11:01 AM, Mike Kravetz wrote:

On 10/12/20 6:59 PM, Xing Zhengjun wrote:



On 10/13/2020 1:40 AM, Mike Kravetz wrote:

On 10/11/20 10:29 PM, Xing Zhengjun wrote:

Hi Mike,

 I re-test it in v5.9-rc8, the regression still existed. It is almost the 
same as 34ae204f1851. Do you have time to look at it? Thanks.



Thank you for testing.

Just curious, did you apply the series in this thread or just test v5.9-rc8?
If just testing v5.9-rc8, no changes to this code were added after 34ae204f1851,
so results being the same are expected.



I just test v5.9-rc8. Where can I find the series patches you mentioned here? 
Or just wait for the next mainline release?



My apologies.  I missed that you were not cc'ed on this thred:
https://lore.kernel.org/linux-mm/20200706202615.32111-1-mike.krav...@oracle.com/

As mentioned, there will likely be another revision to the way locking is
handled.  The new scheme will try to consider performance as is done in
the above link.  I suggest you wait for next revision.  If you do not mind,
I will cc you when the new code is posted.



OK. I will wait for the next revision.

--
Zhengjun Xing


Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

2020-10-12 Thread Xing Zhengjun




On 10/13/2020 1:40 AM, Mike Kravetz wrote:

On 10/11/20 10:29 PM, Xing Zhengjun wrote:

Hi Mike,

I re-test it in v5.9-rc8, the regression still existed. It is almost the 
same as 34ae204f1851. Do you have time to look at it? Thanks.



Thank you for testing.

Just curious, did you apply the series in this thread or just test v5.9-rc8?
If just testing v5.9-rc8, no changes to this code were added after 34ae204f1851,
so results being the same are expected.



I just test v5.9-rc8. Where can I find the series patches you mentioned 
here? Or just wait for the next mainline release?




There are some functional issues with this new hugetlb locking model that
are currently being worked.  It is likely to result in significantly different
code.  The performance issues discovered here will be taken into account with
the new code.  However, as previously mentioned additional synchronization
is required for functional correctness.  As a result, there will be some
regression in this code.



--
Zhengjun Xing


Re: [LKP] [fs] b6509f6a8c: will-it-scale.per_thread_ops -12.6% regression

2020-10-12 Thread Xing Zhengjun




On 10/12/2020 4:18 PM, Mel Gorman wrote:

On Mon, Oct 12, 2020 at 02:20:26PM +0800, Xing Zhengjun wrote:

Hi Mel,

It is a revert commit caused the regression, Do you have a plan to fix
it? Thanks. I re-test it in v5.9-rc8, the regression still existed.



The revert caused a *performance* regression but the original
performance gain caused a functional failure. The overall performance
should be unchanged. I have not revisited the topic since.


Thanks for the explanation. We will stop tracking it.

--
Zhengjun Xing


Re: [LKP] [fs] b6509f6a8c: will-it-scale.per_thread_ops -12.6% regression

2020-10-12 Thread Xing Zhengjun

Hi Mel,

   It is a revert commit caused the regression, Do you have a plan to 
fix it? Thanks. I re-test it in v5.9-rc8, the regression still existed.


=
tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode:

lkp-csl-2ap4/will-it-scale/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-9/100%/thread/eventfd1/performance/0x5002f01

commit:
  v5.8-rc3
  b6509f6a8c4313c068c69785c001451415969e44
  v5.8
  v5.9-rc1
  v5.9-rc8

v5.8-rc3 b6509f6a8c4313c068c69785c00 
v5.8v5.9-rc1v5.9-rc8
 --- --- 
--- ---
 %stddev %change %stddev %change 
%stddev %change %stddev %change %stddev
 \  |\  |\ 
|\  |\
   1652352   -12.6%1444002 ±  2% -13.3%1431865 
  -9.9%1489323-9.1%1502580 
will-it-scale.per_thread_ops
 3.173e+08   -12.6%  2.772e+08 ±  2% -13.3%  2.749e+08 
  -9.9%   2.86e+08-9.1%  2.885e+08 
will-it-scale.workload





On 7/6/2020 9:20 AM, kernel test robot wrote:

Greeting,

FYI, we noticed a -12.6% regression of will-it-scale.per_thread_ops due to 
commit:


commit: b6509f6a8c4313c068c69785c001451415969e44 ("Revert "fs: Do not check if there is a 
fsnotify watcher on pseudo inodes"")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 
192G memory
with following parameters:

nr_task: 100%
mode: thread
test: eventfd1
cpufreq_governor: performance
ucode: 0x5002f01

test-description: Will It Scale takes a testcase and runs it from 1 through to 
n parallel copies to see if the testcase will scale. It builds both a process 
and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale

In addition to that, the commit also has significant impact on the following 
tests:

+--+---+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -6.4% 
regression |
| test machine | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 
with 192G memory |
| test parameters  | cpufreq_governor=performance   
   |
|  | mode=process   
   |
|  | nr_task=100%   
   |
|  | test=unix1 
   |
|  | ucode=0x5002f01
   |
+--+---+
| testcase: change | will-it-scale: will-it-scale.per_thread_ops -2.3% 
regression  |
| test machine | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 
with 192G memory |
| test parameters  | cpufreq_governor=performance   
   |
|  | mode=thread
   |
|  | nr_task=100%   
   |
|  | test=pipe1 
   |
|  | ucode=0x5002f01
   |
+--+---+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
-->


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

=
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
   
gcc-9/performance/x86_64-rhel-7.6/thread/100%/debian-x86_64-20191114.cgz/lkp-csl-2ap4/eventfd1/will-it-scale/0x5002f01

commit:
   v5.8-rc3
   b6509f6a8c ("Revert "fs: Do not check if there is a fsnotify watcher on pseudo 
inodes"")

 v5.8-rc3 b6509f6a8c4313c068c69785c00
 ---
  %stddev %change %stddev
  \  |\

Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

2020-10-11 Thread Xing Zhengjun

Hi Mike,

   I re-test it in v5.9-rc8, the regression still existed. It is almost 
the same as 34ae204f1851. Do you have time to look at it? Thanks.


=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/size/test/cpufreq_governor/ucode:

lkp-knm01/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/8T/anon-cow-seq-hugetlb/performance/0x11

commit:
  49aef7175cc6eb703a9280a7b830e675fe8f2704
  c0d0381ade79885c04a04c303284b040616b116e
  v5.8
  34ae204f18519f0920bd50a644abd6fefc8dbfcf
  v5.9-rc1
  v5.9-rc8

49aef7175cc6eb70 c0d0381ade79885c04a04c30328v5.8 
34ae204f18519f0920bd50a644av5.9-rc1 
  v5.9-rc8
 --- --- 
--- --- 
---
 %stddev %change %stddev %change 
%stddev %change %stddev %change %stddev 
%change %stddev
 \  |\  |\ 
|\  |\  | 
 \
 38043 ±  3% -30.2%  26560 ±  4% -29.5%  26815 ± 
6%  -7.4%  35209 ±  2%  -7.4%  35244-8.8% 
  34704vm-scalability.median
  7.86 ± 19%  +9.7   17.54 ± 21% +10.4   18.23 ± 
34%  -3.14.75 ±  7%  -4.53.36 ±  7%  -4.0 
3.82 ± 15%  vm-scalability.median_stddev%
  12822071 ±  3% -34.1%8450822 ±  4% -33.6%8517252 ± 
6% -10.7%   11453675 ±  2% -10.2%   11513595 ±  2% -11.6% 
11331657vm-scalability.throughput
 2.523e+09 ±  3% -20.7%  2.001e+09 ±  5% -19.9%  2.021e+09 ± 
7%  +6.8%  2.694e+09 ±  2%  +7.3%  2.707e+09 ±  2%  +5.4% 
2.661e+09vm-scalability.workload



On 8/22/2020 7:36 AM, Mike Kravetz wrote:

On 8/21/20 2:02 PM, Mike Kravetz wrote:

Would you be willing to test this series on top of 34ae204f1851?  I will need
to rebase the series to take the changes made by 34ae204f1851 into account.


Actually, the series in this thread will apply/run cleanly on top of
34ae204f1851.  No need to rebase or port.  If we decide to move forward more
work is required.  See a few FIXME's in the patches.



--
Zhengjun Xing


Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

2020-08-21 Thread Xing Zhengjun




On 6/26/2020 5:33 AM, Mike Kravetz wrote:

On 6/22/20 3:01 PM, Mike Kravetz wrote:

On 6/21/20 5:55 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a -33.4% regression of vm-scalability.throughput due to commit:


commit: c0d0381ade79885c04a04c303284b040616b116e ("hugetlbfs: use i_mmap_rwsem for 
more pmd sharing synchronization")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: vm-scalability
on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G 
memory
with following parameters:

runtime: 300s
size: 8T
test: anon-cow-seq-hugetlb
cpufreq_governor: performance
ucode: 0x11



Some performance regression is not surprising as the change includes acquiring
and holding the i_mmap_rwsem (in read mode) during hugetlb page faults.  33.4%
seems a bit high.  But, the test is primarily exercising the hugetlb page
fault path and little else.

The reason for taking the i_mmap_rwsem is to prevent PMD unsharing from
invalidating the pmd we are operating on.  This specific test case is operating
on anonymous private mappings.  So, PMD sharing is not possible and we can
eliminate acquiring the mutex in this case.  In fact, we should check all
mappings (even sharable) for the possibly of PMD sharing and only take the
mutex if necessary.  It will make the code a bit uglier, but will take care
of some of these regressions.  We still need to take the mutex in the case
of PMD sharing.  I'm afraid a regression is unavoidable in that case.

I'll put together a patch.


Not acquiring the mutex on faults when sharing is not possible is quite
straight forward.  We can even use the existing routine vma_shareable()
to easily check.  However, the next patch in the series 87bf91d39bb5
"hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race" depends
on always acquiring the mutex.  If we break this assumption, then the
code to back out hugetlb reservations needs to be written.  A high level
view of what needs to be done is in the commit message for 87bf91d39bb5.

I'm working on the code to back out reservations.



I find that 34ae204f18519f0920bd50a644abd6fefc8dbfcf(hugetlbfs: remove 
call to huge_pte_alloc without i_mmap_rwsem) fixed this regression, I 
test with the patch, the regression reduced to 10.1%, do you have plan 
to continue to improve it? Thanks.


=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/size/test/cpufreq_governor/ucode:

lkp-knm01/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/8T/anon-cow-seq-hugetlb/performance/0x11

commit:
  49aef7175cc6eb703a9280a7b830e675fe8f2704
  c0d0381ade79885c04a04c303284b040616b116e
  v5.8
  34ae204f18519f0920bd50a644abd6fefc8dbfcf
  v5.9-rc1

49aef7175cc6eb70 c0d0381ade79885c04a04c30328v5.8 
34ae204f18519f0920bd50a644av5.9-rc1
 --- --- 
--- ---
 %stddev %change %stddev %change 
%stddev %change %stddev %change %stddev
 \  |\  |\ 
|\  |\
 38084   -31.1%  26231 ±  2% -26.6%  27944 ± 
5%  -7.0%  35405-7.5%  35244 
vm-scalability.median
  9.92 ±  9% +12.0   21.95 ±  4%  +3.9   13.87 ± 
30%  -5.34.66 ±  9%  -6.63.36 ±  7% 
vm-scalability.median_stddev%
  12827311   -35.0%8340256 ±  2% -30.9%8865669 ± 
5% -10.1%   11532087   -10.2%   11513595 ±  2% 
vm-scalability.throughput
 2.507e+09   -22.7%  1.938e+09   -15.3%  2.122e+09 ± 
6%  +8.0%  2.707e+09+8.0%  2.707e+09 ±  2% 
vm-scalability.workload




--
Zhengjun Xing


Re: [LKP] Re: [ext4] d3b6f23f71: stress-ng.fiemap.ops_per_sec -60.5% regression

2020-08-18 Thread Xing Zhengjun




On 7/22/2020 2:17 PM, Xing Zhengjun wrote:



On 7/15/2020 7:04 PM, Ritesh Harjani wrote:

Hello Xing,

On 4/7/20 1:30 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a -60.5% regression of stress-ng.fiemap.ops_per_sec 
due to commit:



commit: d3b6f23f71670007817a5d59f3fbafab2b794e8c ("ext4: move 
ext4_fiemap to use iomap framework")

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz 
with 192G memory

with following parameters:

nr_threads: 10%
disk: 1HDD
testtime: 1s
class: os
cpufreq_governor: performance
ucode: 0x52c
fs: ext4


I started looking into this issue. But with my unit testing, I didn't
find any perf issue with fiemap ioctl call. I haven't yet explored about
how stress-ng take fiemap performance numbers, it could be doing
something differently. But in my testing I just made sure to create a
file with large number of extents and used xfs_io -c "fiemap -v" cmd
to check how much time it takes to read all the entries in 1st
and subsequent iterations.


Setup comprised of qemu machine on x86_64 with latest linux branch.

1. created a file of 10G using fallocate. (this allocated unwritten
extents for this file).

2. Then I punched hole on every alternate block of file. This step took
a long time. And after sufficiently long time, I had to cancel it.
for i in $(seq 1 2 x); do echo $i; fallocate -p -o $(($i*4096)) -l 
4096; done


3. Then issued fiemap call via xfs_io and took the time measurement.
time xfs_io -c "fiemap -v" bigfile > /dev/null


Perf numbers on latest default kernel build for above cmd.

1st iteration
==
real    0m31.684s
user    0m1.593s
sys 0m24.174s

2nd and subsequent iteration

real    0m3.379s
user    0m1.300s
sys 0m2.080s


4. Then I reverted all the iomap_fiemap patches and re-tested this.
With this the older ext4_fiemap implementation will be tested:-


1st iteration
==
real    0m31.591s
user    0m1.400s
sys 0m24.243s


2nd and subsequent iteration (had to cancel it since it was taking 
more time then 15m)


^C^C
real    15m49.884s
user    0m0.032s
sys 15m49.722s

I guess the reason why 2nd iteration with older implementation is taking
too much time is since with previous implementation we never cached
extent entries in extent_status tree. And also in 1st iteration the page
cache may get filled with lot of buffer_head entries. So maybe page
reclaims are taking more time.

While with the latest implementation using iomap_fiemap(), the call to 
query extent blocks is done using ext4_map_blocks(). ext4_map_blocks()

by default will also cache the extent entries into extent_status tree.
Hence during 2nd iteration, we will directly read the entries from 
extent_status tree and will not do any disk I/O.


-ritesh


I re-test it on the v5.9-rc1, the regression still existed. Have you 
tried stress-ng test cases?




Could you try stress-ng( https://kernel.ubuntu.com/~cking/stress-ng/) 
test cases?  The tarballs can be get from 
https://kernel.ubuntu.com/~cking/tarballs/stress-ng/.
The command for this case you can try "stress-ng --timeout 1 --times 
--verify --metrics-brief --sequential 9 --class os --minimize --exclude 
spawn,exec,swap"

I re-test it on the v5.8-rc6, the regression still existed.

= 

tbox_group/testcase/rootfs/kconfig/compiler/debug-setup/nr_threads/disk/testtime/fs/class/cpufreq_governor/ucode: 



lkp-csl-2sp5/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/test/10%/1HDD/1s/ext4/os/performance/0x5002f01 



commit:
   b2c5764262edded1b1cfff5a6ca82c3d61bb4a4a
   d3b6f23f71670007817a5d59f3fbafab2b794e8c
   v5.8-rc6

b2c5764262edded1 d3b6f23f71670007817a5d59f3f    v5.8-rc6
 --- ---
  %stddev %change %stddev %change %stddev
  \  |    \  |    \
  20419 ±  3%  -4.9%  19423 ±  4% +27.1%  25959   
stress-ng.af-alg.ops
  19655 ±  3%  -5.7%  18537 ±  4% +27.8%  25111   
stress-ng.af-alg.ops_per_sec
  64.67 ±  5% -17.0%  53.67 ± 38% +22.2%  79.00 ± 
9%  stress-ng.chdir.ops
  55.34 ±  3% -13.3%  47.99 ± 38% +26.4%  69.96 ± 
10%  stress-ng.chdir.ops_per_sec
  64652 ±  7% -14.1%  55545 ± 11% -13.6%  55842 ± 
6%  stress-ng.chown.ops
  64683 ±  7% -14.1%  55565 ± 11% -13.6%  55858 ± 
6%  stress-ng.chown.ops_per_sec
   2805 ±  2%  +0.6%   2820 ±  2%    +130.0%   6452   
stress-ng.clone.ops
   2802 ±  2%  +0.6%   2818 ±  2%    +129.9%   6443   
stress-ng.cl

Re: [LKP] [rcu] 276c410448: will-it-scale.per_thread_ops -12.3% regression

2020-08-18 Thread Xing Zhengjun




On 6/17/2020 12:28 AM, Paul E. McKenney wrote:

On Tue, Jun 16, 2020 at 10:02:24AM +0800, Xing Zhengjun wrote:

Hi Paul,

Do you have time to take a look at this? Thanks.


I do not see how this change could affect anything that isn't directly
using RCU Tasks Trace.  Yes, there is some addition to process creation,
but that isn't what is showing the increased overhead.

I see that the instruction count increased.  Is it possible that this
is due to changes in offsets within the task_struct structure?

Thanx, Paul



How about this regression? I test the latest v5.9-rc1, the regression is 
still exsited.


=
tbox_group/testcase/rootfs/kconfig/compiler/debug-setup/nr_task/mode/test/cpufreq_governor/ucode:

lkp-knm01/will-it-scale/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-9/test2/100%/thread/page_fault3/performance/0x11

commit:
  b0afa0f056676ffe0a7213818f09d2460adbcc16
  276c410448dbca357a2bc3539acfe04862e5f172
  v5.9-rc1

b0afa0f056676ffe 276c410448dbca357a2bc3539acv5.9-rc1
 --- ---
 %stddev %change %stddev %change %stddev
 \  |\  |\
  1417   -13.2%   1230 ±  2% -16.6%   1182 
  will-it-scale.per_thread_ops
408456   -13.2% 354391 ±  2% -16.6% 340519 
  will-it-scale.workload






On 6/15/2020 4:57 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a -12.3% regression of will-it-scale.per_thread_ops due to 
commit:


commit: 276c410448dbca357a2bc3539acfe04862e5f172 ("rcu-tasks: Split 
->trc_reader_need_end")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G 
memory
with following parameters:

nr_task: 100%
mode: thread
test: page_fault3
cpufreq_governor: performance
ucode: 0x11

test-description: Will It Scale takes a testcase and runs it from 1 through to 
n parallel copies to see if the testcase will scale. It builds both a process 
and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
-->


To reproduce:

  git clone https://github.com/intel/lkp-tests.git
  cd lkp-tests
  bin/lkp install job.yaml  # job file is attached in this email
  bin/lkp run job.yaml

=
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:

gcc-9/performance/x86_64-rhel-7.6/thread/100%/debian-x86_64-20191114.cgz/lkp-knm01/page_fault3/will-it-scale/0x11

commit:
b0afa0f056 ("rcu-tasks: Provide boot parameter to delay IPIs until late in grace 
period")
276c410448 ("rcu-tasks: Split ->trc_reader_need_end")

b0afa0f056676ffe 276c410448dbca357a2bc3539ac
 ---
 fail:runs  %reproductionfail:runs
 | | |
2:4  -50%:4 
dmesg.WARNING:at#for_ip_interrupt_entry/0x
 :4   28%   1:4 
perf-profile.calltrace.cycles-pp.error_entry
0:40%   0:4 
perf-profile.children.cycles-pp.error_exit
1:47%   2:4 
perf-profile.children.cycles-pp.error_entry
0:44%   1:4 
perf-profile.self.cycles-pp.error_entry
   %stddev %change %stddev
   \  |\
1414   -12.3%   1241 ±  2%  will-it-scale.per_thread_ops
  463.32+1.7% 470.99will-it-scale.time.elapsed_time
  463.32+1.7% 470.99
will-it-scale.time.elapsed_time.max
  407566   -12.3% 357573 ±  2%  will-it-scale.workload
   48.51-1.5%  47.77boot-time.boot
   7.203e+10   +20.0%   8.64e+10 ±  2%  cpuidle.C1.time
   2.162e+08 ±  2% +27.7%  2.761e+08 ±  2%  cpuidle.C1.usage
   60.50   +12.2   72.74 ±  2%  mpstat.cpu.all.idle%
   39.17   -12.2   26.97 ±  6%  mpstat.cpu.all.sys%
2334 ± 12% +18.8%   2772 ±  5%  
slabinfo.khugepaged_mm_slot.active_objs
2334 ± 12% +18.8%   2772 ±  5%  
slabinfo.khugepaged_mm_slot.num_objs
   60.25   +20.3%  72.50 ±  2%  vmstat.cpu.id
   92.75 ±  3% 

Re: [LKP] Re: [fsnotify] c738fbabb0: will-it-scale.per_process_ops -9.5% regression

2020-07-26 Thread Xing Zhengjun




On 7/24/2020 10:44 AM, Rong Chen wrote:



On 7/21/20 11:59 PM, Amir Goldstein wrote:
On Tue, Jul 21, 2020 at 3:15 AM kernel test robot 
 wrote:

Greeting,

FYI, we noticed a -9.5% regression of will-it-scale.per_process_ops 
due to commit:



commit: c738fbabb0ff62d0f9a9572e56e65d05a1b34c6a ("fsnotify: fold 
fsnotify() call into fsnotify_parent()")

Strange, that's a pretty dumb patch moving some inlined code from one
function to
another (assuming there are no fsnotify marks in this test).

Unless I am missing something the only thing that changes slightly is
an extra d_inode(file->f_path.dentry) deference.
I can get rid of it.

Is it possible to ask for a re-test with fix patch (attached)?




I apply the fix patch, the regression still exists.
=
tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode:

lkp-csl-2ap2/will-it-scale/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/16/process/open1/performance/0x5002f01

commit:
  71d734103edfa2b4c6657578a3082ee0e51d767e
  c738fbabb0ff62d0f9a9572e56e65d05a1b34c6a
  5c32fe90f2a57e7c4da06be51f705aec6affceb6 (the commit which the fix 
patch apply based on)

  7f66797f773621d0ef6718df0ef2cf849814d114 (the fix patch)

71d734103edfa2b4 c738fbabb0ff62d0f9a9572e56e 5c32fe90f2a57e7c4da06be51f7 
7f66797f773621d0ef6718df0ef
 --- --- 
---
 %stddev %change %stddev %change 
%stddev %change %stddev
 \  |\  |\ 
|\
229940-9.8% 207333   -13.0% 16 
 -11.7% 202927will-it-scale.per_process_ops
   3679048-9.8%3317347   -13.0%3199942 
 -11.7%3246851will-it-scale.workload





Hi Amir,

We failed to apply this patch, could you tell us the base commit or the 
base branch?


Best Regards,
Rong Chen
___
LKP mailing list -- l...@lists.01.org
To unsubscribe send an email to lkp-le...@lists.01.org


--
Zhengjun Xing


Re: [LKP] [x86, sched] 1567c3e346: vm-scalability.median -15.8% regression

2020-07-23 Thread Xing Zhengjun




On 7/9/2020 8:43 PM, Giovanni Gherdovich wrote:

On Tue, 2020-07-07 at 10:58 +0800, Xing Zhengjun wrote:


On 6/12/2020 4:11 PM, Xing Zhengjun wrote:

Hi Giovanni,

 I test the regression, it still existed in v5.7.  Do you have time
to take a look at this? Thanks.



Ping...



Hello,

I haven't sat down to reproduce this yet but I've read the benchmark code and
configuration, and this regression seems likely to be more of a benchmarking
artifact than an actual performance bug.

Likely a benchmarking artifact:

First off, the test used the "performance" governor from the "intel_pstate"
cpufreq driver, but points at the patch introducing the "frequency invariance
on x86" feature as the culprit. This is suspicious because "frequency
invariance on x86" influences frequency selection when the "schedutil" governor
is in use (not your case). It may also affect the scheduler load balancing but
here you have $NUM_CPUS processes so there isn't a lot of room for creativity
there, each CPU gets a process.

Some notes on this benchmark for my future reference:

The test in question is "anon-cow-seq" from "vm-scalability", which is based
on the "usemem" program originally written by Andrew Morton and exercises the
memory management subsystem. The invocation is:

 usemem --nproc $NUM_CPUS   \
   --prealloc  \
   --prefault  \
   $SIZE

What this does is to create an anonymous mmap()-ing of $SIZE bytes in the main
process, fork $NUM_CPUS distinct child processes and have all of them scan the
mapping sequentially from byte 0 to byte N, writing 0, 1, 2, ..., N on the
region as they scan it, all together at the same time. So we have the "anon"
part (the mapping isn't file-backed), the "cow" part (the parent process
allocates the region, then each children copy-on-write's to it) and the "seq"
part (memory accesses happen sequentially from low to high address). The test
measures how quick this happens; I believe the regression happens in the
median time it takes a process to finish (or the median throughput, but $SIZE
is fixed so it's equivalent).

The $SIZE parameter is selected so that there is enough space for everybody:
each children plus the parent need a copy of the mapped region, so that makes
$NUM_CPUS+1 instances. The formula for $SIZE adds a factor 2 for good measure:

 SIZE = $MEM_SIZE / ($NUM_CPUS + 1) / 2

So we have a benchmark dominated by page allocation and copying, run with the
"performance" cpufreq governor, and your bisections points to a commit such as
1567c3e3467cddeb019a7b53ec632f834b6a9239 ("x86, sched: Add support for
frequency invariance") which:

* changes how frequency is selected by a governor you're not using
* doesn't touch the memory management subsystem or related functions

I'm not entirely dismissing your finding, just explaining why this analysis
hasn't been in my top priorities lately (plus, I've just returned from a 3
weeks vacation :). I'm curious too about what causes the test to go red, but
I'm not overly worried given the above context.


Thanks,
Giovanni Gherdovich



This regression only happened on the testbox "lkp-hsw-4ex1", the machine 
hardware info:

model: Haswell-EX
nr_node: 4
nr_cpu: 144
memory: 512G
brand: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz

We have ever reproduced it for many times, but recently we upgrade both 
software and hardware for it, then we can not reproduce the regression 
on it, we also try to revert the upgrade, it still can not be 
reproduced. We will continue to run the test case and once the 
regression reproduced will let you know.



--
Zhengjun Xing


Re: [LKP] [xfs] a5949d3fae: aim7.jobs-per-min -33.6% regression

2020-07-22 Thread Xing Zhengjun




On 7/7/2020 2:30 AM, Darrick J. Wong wrote:

On Wed, Jul 01, 2020 at 03:49:52PM +0800, Xing Zhengjun wrote:



On 6/10/2020 11:07 AM, Xing Zhengjun wrote:

Hi Darrick,

     Do you have time to take a look at this? Thanks.




Ping...


Yes, that decrease is the expected end result of making the write path
take a longer route to avoid a file corruption vector.

--D



Thanks for the explanation, We will stop tracking it.





On 6/6/2020 11:48 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a -33.6% regression of aim7.jobs-per-min due to commit:


commit: a5949d3faedf492fa7863b914da408047ab46eb0 ("xfs: force writes
to delalloc regions to unwritten")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: aim7
on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @
2.70GHz with 64G memory
with following parameters:

 disk: 1BRD_48G
 fs: xfs
 test: sync_disk_rw
 load: 600
 cpufreq_governor: performance
 ucode: 0x42e

test-description: AIM7 is a traditional UNIX system level benchmark
suite which is used to test and measure the performance of multiuser
system.
test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
-->



To reproduce:

  git clone https://github.com/intel/lkp-tests.git
  cd lkp-tests
  bin/lkp install job.yaml  # job file is attached in this email
  bin/lkp run job.yaml

=

compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase/ucode:

 
gcc-9/performance/1BRD_48G/xfs/x86_64-rhel-7.6/600/debian-x86_64-20191114.cgz/lkp-ivb-2ep1/sync_disk_rw/aim7/0x42e


commit:
    590b16516e ("xfs: refactor xfs_iomap_prealloc_size")
    a5949d3fae ("xfs: force writes to delalloc regions to unwritten")

590b16516ef38e2e a5949d3faedf492fa7863b914da
 ---
     fail:runs  %reproduction    fail:runs
     | | |
     :4   50%   2:4
dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x
   %stddev %change %stddev
   \  |    \
   35272   -33.6%  23430    aim7.jobs-per-min
  102.13   +50.5% 153.75    aim7.time.elapsed_time
  102.13   +50.5% 153.75    aim7.time.elapsed_time.max
     1388038   +40.2%    1945838
aim7.time.involuntary_context_switches
   43420 ±  2% +13.4%  49255 ±  2%
aim7.time.minor_page_faults
    3123   +44.2%   4504 ±  2%  aim7.time.system_time
   59.31    +6.5%  63.18    aim7.time.user_time
    48595108   +58.6%   77064959
aim7.time.voluntary_context_switches
    1.44   -28.8%   1.02    iostat.cpu.user
    0.07 ±  6%  +0.4    0.44 ±  7%  mpstat.cpu.all.iowait%
    1.44    -0.4    1.02    mpstat.cpu.all.usr%
    8632 ± 50% +75.6%  15156 ± 34%
numa-meminfo.node0.KernelStack
    6583 ±136%    +106.0%  13562 ± 82%
numa-meminfo.node0.PageTables
   63325 ± 11% +14.3%  72352 ± 12%
numa-meminfo.node0.SUnreclaim
    8647 ± 50% +75.3%  15156 ± 34%
numa-vmstat.node0.nr_kernel_stack
    1656 ±136%    +104.6%   3389 ± 82%
numa-vmstat.node0.nr_page_table_pages
   15831 ± 11% +14.3%  18087 ± 12%
numa-vmstat.node0.nr_slab_unreclaimable
   93640 ±  3% +41.2% 132211 ±  2%  meminfo.AnonHugePages
   21641   +39.9%  30271 ±  4%  meminfo.KernelStack
  129269   +12.3% 145114    meminfo.SUnreclaim
   28000   -31.2%  19275    meminfo.max_used_kB
     1269307   -26.9% 927657    vmstat.io.bo
  149.75 ±  3% -17.4% 123.75 ±  4%  vmstat.procs.r
  718992   +13.3% 814567    vmstat.system.cs
  231397    -9.3% 209881 ±  2%  vmstat.system.in
   6.774e+08   +70.0%  1.152e+09    cpuidle.C1.time
    18203372   +60.4%   29198744    cpuidle.C1.usage
   2.569e+08 ± 18% +81.8%  4.672e+08 ±  5%  cpuidle.C1E.time
     2691402 ± 13% +98.7%    5346901 ±  3%  cpuidle.C1E.usage
  990350   +95.0%    1931226 ±  2%  cpuidle.POLL.time
  520061   +97.7%    1028004 ±  2%  cpuidle.POLL.usage
   77231    +1.8%  78602    proc-vmstat.nr_active_anon
   19868    +3.8%  20615    proc-vmstat.nr_dirty
  381302    +1.0% 384969    proc-vmstat.nr_file_pages
    4388    -2.7%   4270
proc-vmstat.nr_inactive_anon
   69865

Re: [ext4] d3b6f23f71: stress-ng.fiemap.ops_per_sec -60.5% regression

2020-07-22 Thread Xing Zhengjun




On 7/15/2020 7:04 PM, Ritesh Harjani wrote:

Hello Xing,

On 4/7/20 1:30 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a -60.5% regression of stress-ng.fiemap.ops_per_sec 
due to commit:



commit: d3b6f23f71670007817a5d59f3fbafab2b794e8c ("ext4: move 
ext4_fiemap to use iomap framework")

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz 
with 192G memory

with following parameters:

nr_threads: 10%
disk: 1HDD
testtime: 1s
class: os
cpufreq_governor: performance
ucode: 0x52c
fs: ext4


I started looking into this issue. But with my unit testing, I didn't
find any perf issue with fiemap ioctl call. I haven't yet explored about
how stress-ng take fiemap performance numbers, it could be doing
something differently. But in my testing I just made sure to create a
file with large number of extents and used xfs_io -c "fiemap -v" cmd
to check how much time it takes to read all the entries in 1st
and subsequent iterations.


Setup comprised of qemu machine on x86_64 with latest linux branch.

1. created a file of 10G using fallocate. (this allocated unwritten
extents for this file).

2. Then I punched hole on every alternate block of file. This step took
a long time. And after sufficiently long time, I had to cancel it.
for i in $(seq 1 2 x); do echo $i; fallocate -p -o $(($i*4096)) -l 
4096; done


3. Then issued fiemap call via xfs_io and took the time measurement.
time xfs_io -c "fiemap -v" bigfile > /dev/null


Perf numbers on latest default kernel build for above cmd.

1st iteration
==
real    0m31.684s
user    0m1.593s
sys 0m24.174s

2nd and subsequent iteration

real    0m3.379s
user    0m1.300s
sys 0m2.080s


4. Then I reverted all the iomap_fiemap patches and re-tested this.
With this the older ext4_fiemap implementation will be tested:-


1st iteration
==
real    0m31.591s
user    0m1.400s
sys 0m24.243s


2nd and subsequent iteration (had to cancel it since it was taking more 
time then 15m)


^C^C
real    15m49.884s
user    0m0.032s
sys 15m49.722s

I guess the reason why 2nd iteration with older implementation is taking
too much time is since with previous implementation we never cached
extent entries in extent_status tree. And also in 1st iteration the page
cache may get filled with lot of buffer_head entries. So maybe page
reclaims are taking more time.

While with the latest implementation using iomap_fiemap(), the call to 
query extent blocks is done using ext4_map_blocks(). ext4_map_blocks()

by default will also cache the extent entries into extent_status tree.
Hence during 2nd iteration, we will directly read the entries from 
extent_status tree and will not do any disk I/O.


-ritesh


Could you try stress-ng( https://kernel.ubuntu.com/~cking/stress-ng/) 
test cases?  The tarballs can be get from 
https://kernel.ubuntu.com/~cking/tarballs/stress-ng/.
The command for this case you can try "stress-ng --timeout 1 --times 
--verify --metrics-brief --sequential 9 --class os --minimize --exclude 
spawn,exec,swap"

I re-test it on the v5.8-rc6, the regression still existed.

=
tbox_group/testcase/rootfs/kconfig/compiler/debug-setup/nr_threads/disk/testtime/fs/class/cpufreq_governor/ucode:

lkp-csl-2sp5/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/test/10%/1HDD/1s/ext4/os/performance/0x5002f01

commit:
  b2c5764262edded1b1cfff5a6ca82c3d61bb4a4a
  d3b6f23f71670007817a5d59f3fbafab2b794e8c
  v5.8-rc6

b2c5764262edded1 d3b6f23f71670007817a5d59f3fv5.8-rc6
 --- ---
 %stddev %change %stddev %change %stddev
 \  |\  |\
 20419 ±  3%  -4.9%  19423 ±  4% +27.1%  25959 
  stress-ng.af-alg.ops
 19655 ±  3%  -5.7%  18537 ±  4% +27.8%  25111 
  stress-ng.af-alg.ops_per_sec
 64.67 ±  5% -17.0%  53.67 ± 38% +22.2%  79.00 ± 
9%  stress-ng.chdir.ops
 55.34 ±  3% -13.3%  47.99 ± 38% +26.4%  69.96 ± 
10%  stress-ng.chdir.ops_per_sec
 64652 ±  7% -14.1%  55545 ± 11% -13.6%  55842 ± 
6%  stress-ng.chown.ops
 64683 ±  7% -14.1%  55565 ± 11% -13.6%  55858 ± 
6%  stress-ng.chown.ops_per_sec
  2805 ±  2%  +0.6%   2820 ±  2%+130.0%   6452 
  stress-ng.clone.ops
  2802 ±  2%  +0.6%   2818 ±  2%+129.9%   6443 
  stress-ng.clone.ops_per_sec
 34.67+1.9%  35.33 ±  3%  -9.6%  31.33 ± 
3%  stress-ng.copy-file.ops
 22297 ± 23% +26.7%  28258 ±  2% +38.1%  30783 ± 
14%  stress-ng.dir.ops_per_sec
  

Re: [LKP] [x86, sched] 1567c3e346: vm-scalability.median -15.8% regression

2020-07-06 Thread Xing Zhengjun




On 6/12/2020 4:11 PM, Xing Zhengjun wrote:

Hi Giovanni,

    I test the regression, it still existed in v5.7.  Do you have time 
to take a look at this? Thanks.




Ping...

= 

tbox_group/testcase/rootfs/kconfig/compiler/runtime/debug-setup/size/test/cpufreq_governor/ucode: 



lkp-hsw-4ex1/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/test/8T/anon-cow-seq/performance/0x16 



commit:
   2a4b03ffc69f2dedc6388e9a6438b5f4c133a40d
   1567c3e3467cddeb019a7b53ec632f834b6a9239
   v5.7-rc1
   v5.7

2a4b03ffc69f2ded 1567c3e3467cddeb019a7b53ec6    v5.7-rc1 
    v5.7
 --- --- 
---
  %stddev %change %stddev %change %stddev 
%change %stddev
  \  |    \  |    \ 
     |    \
     211462   -16.0% 177702   -15.0% 179809  
-15.1% 179510    vm-scalability.median
   5.34 ±  9%  -3.1    2.23 ± 11%  -2.9    2.49 ± 
5%  -2.7    2.61 ± 11%  vm-scalability.median_stddev%
   30430671   -16.3%   25461360   -15.5%   25707029  
-15.5%   25701713    vm-scalability.throughput
  7.967e+09   -11.1%  7.082e+09   -11.1%  7.082e+09  
-11.1%  7.082e+09    vm-scalability.workload




On 4/16/2020 2:20 PM, Giovanni Gherdovich wrote:

On Thu, 2020-04-16 at 14:10 +0800, Xing Zhengjun wrote:

Hi Giovanni,

    1567c3e346("x86, sched: Add support for frequency invariance") has
been merged into Linux mainline v5.7-rc1 now. Do you have time to take a
look at this? Thanks.



Apologies, this slipped under my radar. I'm on it, thanks.


Giovanni Gherdovich





--
Zhengjun Xing


Re: [LKP] [xfs] a5949d3fae: aim7.jobs-per-min -33.6% regression

2020-07-01 Thread Xing Zhengjun




On 6/10/2020 11:07 AM, Xing Zhengjun wrote:

Hi Darrick,

    Do you have time to take a look at this? Thanks.




Ping...



On 6/6/2020 11:48 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a -33.6% regression of aim7.jobs-per-min due to commit:


commit: a5949d3faedf492fa7863b914da408047ab46eb0 ("xfs: force writes 
to delalloc regions to unwritten")

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: aim7
on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz 
with 64G memory

with following parameters:

disk: 1BRD_48G
fs: xfs
test: sync_disk_rw
load: 600
cpufreq_governor: performance
ucode: 0x42e

test-description: AIM7 is a traditional UNIX system level benchmark 
suite which is used to test and measure the performance of multiuser 
system.

test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
--> 




To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

= 

compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase/ucode: 

   
gcc-9/performance/1BRD_48G/xfs/x86_64-rhel-7.6/600/debian-x86_64-20191114.cgz/lkp-ivb-2ep1/sync_disk_rw/aim7/0x42e 



commit:
   590b16516e ("xfs: refactor xfs_iomap_prealloc_size")
   a5949d3fae ("xfs: force writes to delalloc regions to unwritten")

590b16516ef38e2e a5949d3faedf492fa7863b914da
 ---
    fail:runs  %reproduction    fail:runs
    | | |
    :4   50%   2:4 
dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x

  %stddev %change %stddev
  \  |    \
  35272   -33.6%  23430    aim7.jobs-per-min
 102.13   +50.5% 153.75    aim7.time.elapsed_time
 102.13   +50.5% 153.75    aim7.time.elapsed_time.max
    1388038   +40.2%    1945838
aim7.time.involuntary_context_switches
  43420 ±  2% +13.4%  49255 ±  2%  
aim7.time.minor_page_faults

   3123   +44.2%   4504 ±  2%  aim7.time.system_time
  59.31    +6.5%  63.18    aim7.time.user_time
   48595108   +58.6%   77064959
aim7.time.voluntary_context_switches

   1.44   -28.8%   1.02    iostat.cpu.user
   0.07 ±  6%  +0.4    0.44 ±  7%  mpstat.cpu.all.iowait%
   1.44    -0.4    1.02    mpstat.cpu.all.usr%
   8632 ± 50% +75.6%  15156 ± 34%  
numa-meminfo.node0.KernelStack
   6583 ±136%    +106.0%  13562 ± 82%  
numa-meminfo.node0.PageTables
  63325 ± 11% +14.3%  72352 ± 12%  
numa-meminfo.node0.SUnreclaim
   8647 ± 50% +75.3%  15156 ± 34%  
numa-vmstat.node0.nr_kernel_stack
   1656 ±136%    +104.6%   3389 ± 82%  
numa-vmstat.node0.nr_page_table_pages
  15831 ± 11% +14.3%  18087 ± 12%  
numa-vmstat.node0.nr_slab_unreclaimable

  93640 ±  3% +41.2% 132211 ±  2%  meminfo.AnonHugePages
  21641   +39.9%  30271 ±  4%  meminfo.KernelStack
 129269   +12.3% 145114    meminfo.SUnreclaim
  28000   -31.2%  19275    meminfo.max_used_kB
    1269307   -26.9% 927657    vmstat.io.bo
 149.75 ±  3% -17.4% 123.75 ±  4%  vmstat.procs.r
 718992   +13.3% 814567    vmstat.system.cs
 231397    -9.3% 209881 ±  2%  vmstat.system.in
  6.774e+08   +70.0%  1.152e+09    cpuidle.C1.time
   18203372   +60.4%   29198744    cpuidle.C1.usage
  2.569e+08 ± 18% +81.8%  4.672e+08 ±  5%  cpuidle.C1E.time
    2691402 ± 13% +98.7%    5346901 ±  3%  cpuidle.C1E.usage
 990350   +95.0%    1931226 ±  2%  cpuidle.POLL.time
 520061   +97.7%    1028004 ±  2%  cpuidle.POLL.usage
  77231    +1.8%  78602    proc-vmstat.nr_active_anon
  19868    +3.8%  20615    proc-vmstat.nr_dirty
 381302    +1.0% 384969    proc-vmstat.nr_file_pages
   4388    -2.7%   4270
proc-vmstat.nr_inactive_anon
  69865    +4.7%  73155
proc-vmstat.nr_inactive_file
  21615   +40.0%  30251 ±  4%  
proc-vmstat.nr_kernel_stack

   7363    -3.2%   7127    proc-vmstat.nr_mapped
  12595 ±  3%  +5.2%  13255 ±  4%  proc-vmstat.nr_shmem
  19619

Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression

2020-06-18 Thread Xing Zhengjun




On 6/18/2020 4:24 PM, Hillf Danton wrote:


On Thu, 18 Jun 2020 10:45:01 +0800 Xing Zhengjun wrote:

On 6/18/2020 12:25 AM, Vincent Guittot wrote:

Le mercredi 17 juin 2020 à 16:57:25 (+0200), Vincent Guittot a écrit :

Le mercredi 17 juin 2020 à 08:30:21 (+0800), Xing Zhengjun a écrit :



On 6/16/2020 2:54 PM, Vincent Guittot wrote:


Hi Xing,

Le mardi 16 juin 2020 à 11:17:16 (+0800), Xing Zhengjun a écrit :



On 6/15/2020 4:10 PM, Vincent Guittot wrote:

Hi Xing,

Le lundi 15 juin 2020 à 15:26:59 (+0800), Xing Zhengjun a écrit :



On 6/12/2020 7:06 PM, Hillf Danton wrote:


On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote:




...


...


I apply the patch based on v5.7, the test result is as the following:

=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
9f68395333ad7f5bfe2f83473fed363d4229f11c
070f5e860ee2bf588c99ef7b4c202451faa48236
v5.7
63a5d0fbb5ec62f5148c251c01e709b8358cd0ee (the test patch)

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7
63a5d0fbb5ec62f5148c251c01e
 --- ---
---
   %stddev %change %stddev %change %stddev %change
%stddev
   \  |\  |\
|\
0.69   -10.3%   0.62-9.1%   0.62
+1.0%   0.69reaim.child_systime
0.62-1.0%   0.61+0.5%   0.62
-0.1%   0.62reaim.child_utime
   66870   -10.0%  60187-7.6%  61787
+1.1%  67636reaim.jobs_per_min
   16717   -10.0%  15046-7.6%  15446
+1.1%  16909reaim.jobs_per_min_child


OK. So the regression disappears when the conditions on runnable_avg are 
removed.

In the meantime, I have been able to understand more deeply what was happeningi
for this bench and how it is impacted by
commit: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify 
group")

This bench forks a new thread for each and every new step. But a newly forked
threads start with a load_avg and a runnable_avg set to max whereas the threads
are running shortly before exiting. This makes the CPU to be set overloaded in
some case whereas it isn't.

Could you try the patch below ?
It fixes the problem on my setup (I have finally been able to reproduce the 
problem)

---
   kernel/sched/fair.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0ae62807..b33a4a9e1491 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -807,7 +807,7 @@ void post_init_entity_util_avg(struct task_struct *p)
}
}
   
-	sa->runnable_avg = cpu_scale;

+   sa->runnable_avg = sa->util_avg;
   
   	if (p->sched_class != _sched_class) {

/*
--
2.17.1



The patch above tries to move back to the group in the same classification as
before but this could harm other benchmarks.

There is another way to fix this by easing the migration of task in the case
of migrate_util imbalance.

Could you also try the patch below instead of the one above ?

---
   kernel/sched/fair.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0ae62807..fcaf66c4d086 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7753,7 +7753,8 @@ static int detach_tasks(struct lb_env *env)
case migrate_util:
util = task_util_est(p);

-   if (util > env->imbalance)
+   if (util/2 > env->imbalance &&
+   env->sd->nr_balance_failed <= 
env->sd->cache_nice_tries)
goto next;


Hmm... this sheds a shaft of light on computing imbalance for
migrate_util, see below.



env->imbalance -= util;
--
2.17.1




I apply the patch based on v5.7, the test result is as the following:


Thanks.



=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:
  
lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21


commit:
9f68395333ad7f5bfe2f83473fed363d4229f11c
070f5e860ee2bf588c99ef7b4c202451faa48236
v5.7
69c81543653bf5f2c7105086502889fa019c15cb  (the test patch)

9f68395333ad7f5b 070f5e860ee2bf588c99ef7

Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression

2020-06-18 Thread Xing Zhengjun




On 6/18/2020 8:35 PM, Vincent Guittot wrote:

On Thu, 18 Jun 2020 at 04:45, Xing Zhengjun
 wrote:








This bench forks a new thread for each and every new step. But a newly forked
threads start with a load_avg and a runnable_avg set to max whereas the threads
are running shortly before exiting. This makes the CPU to be set overloaded in
some case whereas it isn't.

Could you try the patch below ?
It fixes the problem on my setup (I have finally been able to reproduce the 
problem)

---
   kernel/sched/fair.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0ae62807..b33a4a9e1491 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -807,7 +807,7 @@ void post_init_entity_util_avg(struct task_struct *p)
  }
  }

-sa->runnable_avg = cpu_scale;
+sa->runnable_avg = sa->util_avg;

  if (p->sched_class != _sched_class) {
  /*
--
2.17.1



The patch above tries to move back to the group in the same classification as
before but this could harm other benchmarks.

There is another way to fix this by easing the migration of task in the case
of migrate_util imbalance.

Could you also try the patch below instead of the one above ?

---
   kernel/sched/fair.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0ae62807..fcaf66c4d086 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7753,7 +7753,8 @@ static int detach_tasks(struct lb_env *env)
   case migrate_util:
   util = task_util_est(p);

- if (util > env->imbalance)
+ if (util/2 > env->imbalance &&
+ env->sd->nr_balance_failed <= 
env->sd->cache_nice_tries)
   goto next;

   env->imbalance -= util;
--
2.17.1




I apply the patch based on v5.7, the test result is as the following:

=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
9f68395333ad7f5bfe2f83473fed363d4229f11c
070f5e860ee2bf588c99ef7b4c202451faa48236
v5.7
69c81543653bf5f2c7105086502889fa019c15cb  (the test patch)

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7
69c81543653bf5f2c7105086502
 --- ---
---
   %stddev %change %stddev %change
%stddev %change %stddev
   \  |\  |\
  |\
0.69   -10.3%   0.62-9.1%   0.62
-7.6%   0.63reaim.child_systime
0.62-1.0%   0.61+0.5%   0.62
+1.9%   0.63reaim.child_utime
   66870   -10.0%  60187-7.6%  61787
-5.9%  62947reaim.jobs_per_min


There is an improvement but not at the same level as on my setup.
I'm not sure with patch you tested here. Is it the last one that
modify detach_tasks() or the previous one that modify
post_init_entity_util_avg() ?


It is the last one that modify detach_tasks().


Could you also try the other one ? Both patches were improving results
on y setup but the behavior doesn't seem to be the same on your setup.



The test result for the other one has been sent in another mail.




   16717   -10.0%  15046-7.6%  15446
-5.9%  15736reaim.jobs_per_min_child
   97.84-1.1%  96.75-0.4%  97.43
-0.4%  97.47reaim.jti
   72000   -10.8%  64216-8.3%  66000
-5.7%  67885reaim.max_jobs_per_min
0.36   +10.6%   0.40+7.8%   0.39
+6.0%   0.38reaim.parent_time
1.58 ±  2% +71.0%   2.70 ±  2% +26.9%   2.01 ±
2% +23.6%   1.95 ±  3%  reaim.std_dev_percent
0.00 ±  5%+110.4%   0.01 ±  3% +48.8%   0.01 ±
7% +43.2%   0.01 ±  5%  reaim.std_dev_time
   50800-2.4%  49600-1.6%  5
-0.8%  50400    reaim.workload






...


--
Zhengjun Xing


--
Zhengjun Xing


--
Zhengjun Xing


--
Zhengjun Xing


Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression

2020-06-18 Thread Xing Zhengjun




On 6/17/2020 10:57 PM, Vincent Guittot wrote:

Le mercredi 17 juin 2020 à 08:30:21 (+0800), Xing Zhengjun a écrit :



On 6/16/2020 2:54 PM, Vincent Guittot wrote:


Hi Xing,

Le mardi 16 juin 2020 à 11:17:16 (+0800), Xing Zhengjun a écrit :



On 6/15/2020 4:10 PM, Vincent Guittot wrote:

Hi Xing,

Le lundi 15 juin 2020 à 15:26:59 (+0800), Xing Zhengjun a écrit :



On 6/12/2020 7:06 PM, Hillf Danton wrote:


On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote:




...





I apply the patch based on v5.7, the test result is as the following:


TBH, I didn't expect that the results would still be bad, so i wonder if the 
threshold are
the root problem.

Could you run tests with the patch below that removes condition with 
runnable_avg ?
I just want to make sure that those 2 conditions are the root cause.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da3e5b54715b..f5774d0af059 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8210,10 +8210,6 @@ group_has_capacity(unsigned int imbalance_pct, struct 
sg_lb_stats *sgs)
  if (sgs->sum_nr_running < sgs->group_weight)
  return true;

-   if ((sgs->group_capacity * imbalance_pct) <
-   (sgs->group_runnable * 100))
-   return false;
-
  if ((sgs->group_capacity * 100) >
  (sgs->group_util * imbalance_pct))
  return true;
@@ -8239,10 +8235,6 @@ group_is_overloaded(unsigned int imbalance_pct, struct 
sg_lb_stats *sgs)
  (sgs->group_util * imbalance_pct))
  return true;

-   if ((sgs->group_capacity * imbalance_pct) <
-   (sgs->group_runnable * 100))
-   return true;
-
  return false;
   }



Thanks.
Vincent




I apply the patch based on v5.7, the test result is as the following:

=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
   9f68395333ad7f5bfe2f83473fed363d4229f11c
   070f5e860ee2bf588c99ef7b4c202451faa48236
   v5.7
   63a5d0fbb5ec62f5148c251c01e709b8358cd0ee (the test patch)

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7
63a5d0fbb5ec62f5148c251c01e
 --- ---
---
  %stddev %change %stddev %change %stddev %change
%stddev
  \  |\  |\
|\
   0.69   -10.3%   0.62-9.1%   0.62
+1.0%   0.69reaim.child_systime
   0.62-1.0%   0.61+0.5%   0.62
-0.1%   0.62reaim.child_utime
  66870   -10.0%  60187-7.6%  61787
+1.1%  67636reaim.jobs_per_min
  16717   -10.0%  15046-7.6%  15446
+1.1%  16909reaim.jobs_per_min_child


OK. So the regression disappears when the conditions on runnable_avg are 
removed.

In the meantime, I have been able to understand more deeply what was happeningi
for this bench and how it is impacted by
   commit: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify 
group")

This bench forks a new thread for each and every new step. But a newly forked
threads start with a load_avg and a runnable_avg set to max whereas the threads
are running shortly before exiting. This makes the CPU to be set overloaded in
some case whereas it isn't.

Could you try the patch below ?
It fixes the problem on my setup (I have finally been able to reproduce the 
problem)

---
  kernel/sched/fair.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0ae62807..b33a4a9e1491 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -807,7 +807,7 @@ void post_init_entity_util_avg(struct task_struct *p)
}
}
  
-	sa->runnable_avg = cpu_scale;

+   sa->runnable_avg = sa->util_avg;
  
  	if (p->sched_class != _sched_class) {

/*



I apply the patch above based on v5.7, the test result is as the following:

=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
  9f68395333ad7f5bfe2f83473fed363d4229f11c
  070f5e860ee2bf588c99ef7b4c202451faa48236
  v5.7
  cbb4d668e7431479a7978fa79d64c2271adefab0 ( the test patch which modify
post_init_entity_util_avg())

9f68395333a

Re: [LKP] Re: [mm] 1431d4d11a: vm-scalability.throughput -11.5% regression

2020-06-18 Thread Xing Zhengjun




On 6/16/2020 10:45 PM, Johannes Weiner wrote:

On Tue, Jun 16, 2020 at 03:57:50PM +0800, kernel test robot wrote:

Greeting,

FYI, we noticed a -11.5% regression of vm-scalability.throughput due to commit:


commit: 1431d4d11abb265e79cd44bed2f5ea93f1bcc57b ("mm: base LRU balancing on an 
explicit cost model")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


That's a really curious result.

An bit of an increase in system time is expected in the series as a
whole:

1. When thrashing is detected in the file cache, it intentionally
makes more effort to find reclaimable anon pages than before - in an
effort to converge on a stable state that has *neither* page reclaim
nor refault IO.

2. There are a couple of XXX about unrealized lock batching
opportunities. Those weren't/aren't expected to have too much of a
practical impact, and require a bit more infrastructure work that
would have interferred with other ongoing work in the area.

However, this patch in particular doesn't add any locked sections (it
adds a function call to an existing one, I guess?), and the workload
is doing streaming mmapped IO and shouldn't experience any thrashing.

In addition, we shouldn't even scan anon pages - from below:


swap_partitions:
rootfs_partition: "/dev/disk/by-id/wwn-0x5000c50067b47753-part1"


Does that mean that no swap space (not even a file) is configured?



In this case, the swap is disabled (if enabled, you should find the 
"swap:" in the job file), "swap_patitions:" is just the description of 
the hardware.



in testcase: vm-scalability
on test machine: 160 threads Intel(R) Xeon(R) CPU E7-8890 v4 @ 2.20GHz with 
256G memory
with following parameters:

runtime: 300s
test: lru-file-mmap-read
cpufreq_governor: performance
ucode: 0xb38

test-description: The motivation behind this suite is to exercise functions and 
regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
-->


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

=
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase/ucode:
   
gcc-9/performance/x86_64-rhel-7.6/debian-x86_64-20191114.cgz/300s/lkp-bdw-ex2/lru-file-mmap-read/vm-scalability/0xb38

commit:
   a4fe1631f3 ("mm: vmscan: drop unnecessary div0 avoidance rounding in 
get_scan_count()")
   1431d4d11a ("mm: base LRU balancing on an explicit cost model")

a4fe1631f313f75c 1431d4d11abb265e79cd44bed2f
 ---
  %stddev %change %stddev
  \  |\
   0.23 ±  2% +11.7%   0.26vm-scalability.free_time


What's free_time?


The average of the time to free memory(unit: second) , you can see the 
output of vm-scalability( 
https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/) 
benchamrk log "xxx usecs to free memory"



 103991   -11.6%  91935vm-scalability.median
   20717269   -11.5%   18336098vm-scalability.throughput
 376.47+8.3% 407.78vm-scalability.time.elapsed_time
 376.47+8.3% 407.78
vm-scalability.time.elapsed_time.max
 392226+7.2% 420612
vm-scalability.time.involuntary_context_switches
  11731+4.4%  12247
vm-scalability.time.percent_of_cpu_this_job_got
  41005   +14.5%  46936vm-scalability.time.system_time
   3156-4.8%   3005vm-scalability.time.user_time
   52662860 ±  5% -14.0%   45266760 ±  5%  meminfo.DirectMap2M
   4.43-0.53.90 ±  2%  mpstat.cpu.all.usr%
   1442 ±  5% -14.9%   1227 ± 10%  
slabinfo.kmalloc-rcl-96.active_objs
   1442 ±  5% -14.9%   1227 ± 10%  slabinfo.kmalloc-rcl-96.num_objs
  37.50 ±  2%  -7.3%  34.75vmstat.cpu.id
  57.25+5.2%  60.25vmstat.cpu.sy
  54428 ± 60% -96.5%   1895 ±173%  numa-meminfo.node1.AnonHugePages
 116516 ± 48% -88.2%  13709 ± 26%  numa-meminfo.node1.AnonPages
 132303 ± 84% -88.9%  14731 ± 61%  numa-meminfo.node3.Inactive(anon)


These counters capture present state, not history. Are these averages
or snapshots? If snapshots, when are they taken during the test?


These are averages.




 311.75 ±  8% +16.0% 361.75 ±  2%  
numa-vmstat.node0.nr_isolated_file
  29136 ± 48% 

Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression

2020-06-17 Thread Xing Zhengjun




On 6/18/2020 12:25 AM, Vincent Guittot wrote:

Le mercredi 17 juin 2020 à 16:57:25 (+0200), Vincent Guittot a écrit :

Le mercredi 17 juin 2020 à 08:30:21 (+0800), Xing Zhengjun a écrit :



On 6/16/2020 2:54 PM, Vincent Guittot wrote:


Hi Xing,

Le mardi 16 juin 2020 à 11:17:16 (+0800), Xing Zhengjun a écrit :



On 6/15/2020 4:10 PM, Vincent Guittot wrote:

Hi Xing,

Le lundi 15 juin 2020 à 15:26:59 (+0800), Xing Zhengjun a écrit :



On 6/12/2020 7:06 PM, Hillf Danton wrote:


On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote:




...


...


I apply the patch based on v5.7, the test result is as the following:

=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
   9f68395333ad7f5bfe2f83473fed363d4229f11c
   070f5e860ee2bf588c99ef7b4c202451faa48236
   v5.7
   63a5d0fbb5ec62f5148c251c01e709b8358cd0ee (the test patch)

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7
63a5d0fbb5ec62f5148c251c01e
 --- ---
---
  %stddev %change %stddev %change %stddev %change
%stddev
  \  |\  |\
|\
   0.69   -10.3%   0.62-9.1%   0.62
+1.0%   0.69reaim.child_systime
   0.62-1.0%   0.61+0.5%   0.62
-0.1%   0.62reaim.child_utime
  66870   -10.0%  60187-7.6%  61787
+1.1%  67636reaim.jobs_per_min
  16717   -10.0%  15046-7.6%  15446
+1.1%  16909reaim.jobs_per_min_child


OK. So the regression disappears when the conditions on runnable_avg are 
removed.

In the meantime, I have been able to understand more deeply what was happeningi
for this bench and how it is impacted by
   commit: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify 
group")

This bench forks a new thread for each and every new step. But a newly forked
threads start with a load_avg and a runnable_avg set to max whereas the threads
are running shortly before exiting. This makes the CPU to be set overloaded in
some case whereas it isn't.

Could you try the patch below ?
It fixes the problem on my setup (I have finally been able to reproduce the 
problem)

---
  kernel/sched/fair.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0ae62807..b33a4a9e1491 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -807,7 +807,7 @@ void post_init_entity_util_avg(struct task_struct *p)
}
}
  
-	sa->runnable_avg = cpu_scale;

+   sa->runnable_avg = sa->util_avg;
  
  	if (p->sched_class != _sched_class) {

/*
--
2.17.1



The patch above tries to move back to the group in the same classification as
before but this could harm other benchmarks.

There is another way to fix this by easing the migration of task in the case
of migrate_util imbalance.

Could you also try the patch below instead of the one above ?

---
  kernel/sched/fair.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0ae62807..fcaf66c4d086 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7753,7 +7753,8 @@ static int detach_tasks(struct lb_env *env)
case migrate_util:
util = task_util_est(p);

-   if (util > env->imbalance)
+   if (util/2 > env->imbalance &&
+   env->sd->nr_balance_failed <= 
env->sd->cache_nice_tries)
goto next;

env->imbalance -= util;
--
2.17.1




I apply the patch based on v5.7, the test result is as the following:

=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
  9f68395333ad7f5bfe2f83473fed363d4229f11c
  070f5e860ee2bf588c99ef7b4c202451faa48236
  v5.7
  69c81543653bf5f2c7105086502889fa019c15cb  (the test patch)

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 
69c81543653bf5f2c7105086502
 --- --- 
---
 %stddev %change 

Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression

2020-06-16 Thread Xing Zhengjun




On 6/16/2020 2:54 PM, Vincent Guittot wrote:


Hi Xing,

Le mardi 16 juin 2020 à 11:17:16 (+0800), Xing Zhengjun a écrit :



On 6/15/2020 4:10 PM, Vincent Guittot wrote:

Hi Xing,

Le lundi 15 juin 2020 à 15:26:59 (+0800), Xing Zhengjun a écrit :



On 6/12/2020 7:06 PM, Hillf Danton wrote:


On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote:




...





I apply the patch based on v5.7, the test result is as the following:


TBH, I didn't expect that the results would still be bad, so i wonder if the 
threshold are
the root problem.

Could you run tests with the patch below that removes condition with 
runnable_avg ?
I just want to make sure that those 2 conditions are the root cause.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da3e5b54715b..f5774d0af059 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8210,10 +8210,6 @@ group_has_capacity(unsigned int imbalance_pct, struct 
sg_lb_stats *sgs)
 if (sgs->sum_nr_running < sgs->group_weight)
 return true;

-   if ((sgs->group_capacity * imbalance_pct) <
-   (sgs->group_runnable * 100))
-   return false;
-
 if ((sgs->group_capacity * 100) >
 (sgs->group_util * imbalance_pct))
 return true;
@@ -8239,10 +8235,6 @@ group_is_overloaded(unsigned int imbalance_pct, struct 
sg_lb_stats *sgs)
 (sgs->group_util * imbalance_pct))
 return true;

-   if ((sgs->group_capacity * imbalance_pct) <
-   (sgs->group_runnable * 100))
-   return true;
-
 return false;
  }



Thanks.
Vincent




I apply the patch based on v5.7, the test result is as the following:

=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
  9f68395333ad7f5bfe2f83473fed363d4229f11c
  070f5e860ee2bf588c99ef7b4c202451faa48236
  v5.7
  63a5d0fbb5ec62f5148c251c01e709b8358cd0ee (the test patch)

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 
63a5d0fbb5ec62f5148c251c01e
 --- --- 
---
 %stddev %change %stddev %change 
%stddev %change %stddev
 \  |\  |\ 
|\
  0.69   -10.3%   0.62-9.1%   0.62 
  +1.0%   0.69reaim.child_systime
  0.62-1.0%   0.61+0.5%   0.62 
  -0.1%   0.62reaim.child_utime
 66870   -10.0%  60187-7.6%  61787 
  +1.1%  67636reaim.jobs_per_min
 16717   -10.0%  15046-7.6%  15446 
  +1.1%  16909reaim.jobs_per_min_child
 97.84-1.1%  96.75-0.4%  97.43 
  +0.3%  98.09reaim.jti
 72000   -10.8%  64216-8.3%  66000 
  +0.0%  72000reaim.max_jobs_per_min
  0.36   +10.6%   0.40+7.8%   0.39 
  -1.1%   0.36reaim.parent_time
  1.58 ±  2% +71.0%   2.70 ±  2% +26.9%   2.01 ± 
2% -11.9%   1.39 ±  4%  reaim.std_dev_percent
  0.00 ±  5%+110.4%   0.01 ±  3% +48.8%   0.01 ± 
7% -27.3%   0.00 ± 15%  reaim.std_dev_time
 50800-2.4%  49600-1.6%  5 
  +0.0%  50800reaim.workload





=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
   9f68395333ad7f5bfe2f83473fed363d4229f11c
   070f5e860ee2bf588c99ef7b4c202451faa48236
   v5.7
   3e1643da53f3fc7414cfa3ad2a16ab2a164b7f4d (the test patch)

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7
3e1643da53f3fc7414cfa3ad2a1
 --- ---
---
  %stddev %change %stddev %change %stddev %change
%stddev
  \  |\  |\
|\
   0.69   -10.3%   0.62-9.1%   0.62
-7.1%   0.64reaim.child_systime
   0.62-1.0%   0.61+0.5%   0.62
+1.3%   0.63reaim.child_utime
  66870   -10.0%  60187-7.6%  61787
-6.1%  62807reaim.j

Re: [LKP] [ext4] d3b6f23f71: stress-ng.fiemap.ops_per_sec -60.5% regression

2020-06-16 Thread Xing Zhengjun

Hi Ritesh,

   I test, the regression still existed in v5.8-rc1. Do you have time 
to take a look at it? Thanks.


On 4/14/2020 1:49 PM, Xing Zhengjun wrote:
Thanks for your quick response, if you need any more test information 
about the regression, please let me known.


On 4/13/2020 6:56 PM, Ritesh Harjani wrote:



On 4/13/20 2:07 PM, Xing Zhengjun wrote:

Hi Harjani,

    Do you have time to take a look at this? Thanks.


Hello Xing,

I do want to look into this. But as of now I am stuck with another
mballoc failure issue. I will get back at this once I have some handle
over that one.

BTW, are you planning to take look at this?

-ritesh




On 4/7/2020 4:00 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a -60.5% regression of stress-ng.fiemap.ops_per_sec 
due to commit:



commit: d3b6f23f71670007817a5d59f3fbafab2b794e8c ("ext4: move 
ext4_fiemap to use iomap framework")

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz 
with 192G memory

with following parameters:

nr_threads: 10%
disk: 1HDD
testtime: 1s
class: os
cpufreq_governor: performance
ucode: 0x52c
fs: ext4






Details are as below:
--> 




To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

= 

class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/testcase/testtime/ucode: 

os/gcc-7/performance/1HDD/ext4/x86_64-rhel-7.6/10%/debian-x86_64-20191114.cgz/lkp-csl-2sp5/stress-ng/1s/0x52c 



commit:
   b2c5764262 ("ext4: make ext4_ind_map_blocks work with fiemap")
   d3b6f23f71 ("ext4: move ext4_fiemap to use iomap framework")

b2c5764262edded1 d3b6f23f71670007817a5d59f3f
 ---
    fail:runs  %reproduction    fail:runs
    | | |
    :4   25%   1:4 
dmesg.WARNING:at#for_ip_interrupt_entry/0x
   2:4    5%   2:4 
perf-profile.calltrace.cycles-pp.sync_regs.error_entry
   2:4    6%   3:4 
perf-profile.calltrace.cycles-pp.error_entry
   3:4    9%   3:4 
perf-profile.children.cycles-pp.error_entry
   0:4    1%   0:4 
perf-profile.self.cycles-pp.error_entry

  %stddev %change %stddev
  \  |    \
  28623   +28.2%  36703 ± 12%  stress-ng.daemon.ops
  28632   +28.2%  36704 ± 12% 
stress-ng.daemon.ops_per_sec

 566.00 ± 22% -53.2% 265.00 ± 53%  stress-ng.dev.ops
 278.81 ± 22% -53.0% 131.00 ± 54%  
stress-ng.dev.ops_per_sec

  73160   -60.6%  28849 ±  3%  stress-ng.fiemap.ops
  72471   -60.5%  28612 ±  3% 
stress-ng.fiemap.ops_per_sec

  23421 ± 12% +21.2%  28388 ±  6%  stress-ng.filename.ops
  22638 ± 12% +20.3%  27241 ±  6% 
stress-ng.filename.ops_per_sec

  21.25 ±  7% -10.6%  19.00 ±  3%  stress-ng.iomix.ops
  38.75 ± 49% -47.7%  20.25 ± 96%  stress-ng.memhotplug.ops
  34.45 ± 52% -51.8%  16.62 ±106% 
stress-ng.memhotplug.ops_per_sec

   1734 ± 10% +31.4%   2278 ± 10%  stress-ng.resources.ops
 807.56 ±  5% +35.2%   1091 ±  8% 
stress-ng.resources.ops_per_sec

    1007356 ±  3% -16.5% 840642 ±  9%  stress-ng.revio.ops
    1007692 ±  3% -16.6% 840711 ±  9% 
stress-ng.revio.ops_per_sec

  21812 ±  3% +16.0%  25294 ±  5%  stress-ng.sysbadaddr.ops
  21821 ±  3% +15.9%  25294 ±  5% 
stress-ng.sysbadaddr.ops_per_sec

 440.75 ±  4% +21.9% 537.25 ±  9%  stress-ng.sysfs.ops
 440.53 ±  4% +21.9% 536.86 ±  9% 
stress-ng.sysfs.ops_per_sec
   13286582   -11.1%   11805520 ±  6% 
stress-ng.time.file_system_outputs
   68253896    +2.4%   69860122 
stress-ng.time.minor_page_faults

 197.00 ±  4% -15.9% 165.75 ± 12%  stress-ng.xattr.ops
 192.45 ±  5% -16.1% 161.46 ± 11% 
stress-ng.xattr.ops_per_sec

  15310   +62.5%  24875 ± 22%  stress-ng.zombie.ops
  15310   +62.5%  24874 ± 22% 
stress-ng.zombie.ops_per_sec

 203.50 ± 12% -47.3% 107.25 ± 49%  vmstat.io.bi
 861318 ± 18% -29.7% 605884 ±  5%  meminfo.AnonHugePages
    1062742 ± 14% -20.2% 847853 ±  3%  meminfo.AnonPages
  31093 ±  6%  +9.6%  34090 ±  3%  meminfo.KernelStack
   7151 ± 34% +55.8%  11145 ±  9%  meminfo.Mlocked
  1.082e+08 ±  5% -40.2%   64705429 ± 31% 
numa-numa

Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression

2020-06-15 Thread Xing Zhengjun




On 6/15/2020 11:10 PM, Hillf Danton wrote:


On Mon, 15 Jun 2020 10:10:41 +0200 Vincent Guittot wrote:

Le lundi 15 juin 2020 15:26:59 (+0800), Xing Zhengjun a crit :


On 6/12/2020 7:06 PM, Hillf Danton wrote:


On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote:


...


I apply the patch based on v5.7, the regression still existed.


Thanks for the test.


Thanks.


I don't know if it's relevant or not but the results seem a bit
better with the patch and I'd like to check that it's only a matter of 
threshold to
fix the problem.

Could you try the patch below which is quite aggressive but will help to 
confirm this ?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 28be1c984a42..3c51d557547b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8322,10 +8322,13 @@ static inline int sg_imbalanced(struct sched_group 
*group)
  static inline bool
  group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
  {
+   unsigned long imb;
+
 if (sgs->sum_nr_running < sgs->group_weight)
 return true;

-   if ((sgs->group_capacity * imbalance_pct) <
+   imb = sgs->sum_nr_running * 100;
+   if ((sgs->group_capacity * imb) <
 (sgs->group_runnable * 100))
 return false;

@@ -8347,6 +8350,8 @@ group_has_capacity(unsigned int imbalance_pct, struct 
sg_lb_stats *sgs)
  static inline bool
  group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
  {
+   unsigned long imb;
+
 if (sgs->sum_nr_running <= sgs->group_weight)
 return false;

@@ -8354,7 +8359,8 @@ group_is_overloaded(unsigned int imbalance_pct, struct 
sg_lb_stats *sgs)
 (sgs->group_util * imbalance_pct))
 return true;

-   if ((sgs->group_capacity * imbalance_pct) <
+   imb = sgs->sum_nr_running * 100;
+   if ((sgs->group_capacity * imb) <
 (sgs->group_runnable * 100))
 return true;




=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
   9f68395333ad7f5bfe2f83473fed363d4229f11c
   070f5e860ee2bf588c99ef7b4c202451faa48236
   v5.7
   6b33257768b8dd3982054885ea310871be2cfe0b (Hillf's patch)

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7
6b33257768b8dd3982054885ea3
 --- ---
---
  %stddev %change %stddev %change %stddev %change
%stddev
  \  |\  |\
|\
   0.69   -10.3%   0.62-9.1%   0.62
-10.1%   0.62reaim.child_systime
   0.62-1.0%   0.61+0.5%   0.62
+0.3%   0.62reaim.child_utime
  66870   -10.0%  60187-7.6%  61787
-8.3%  61305reaim.jobs_per_min
  16717   -10.0%  15046-7.6%  15446
-8.3%  15326reaim.jobs_per_min_child
  97.84-1.1%  96.75-0.4%  97.43
-0.5%  97.37reaim.jti
  72000   -10.8%  64216-8.3%  66000
-8.3%  66000reaim.max_jobs_per_min
   0.36   +10.6%   0.40+7.8%   0.39
+9.4%   0.39reaim.parent_time
   1.58   2% +71.0%   2.70   2% +26.9%   2.01  2%
+33.2%   2.11reaim.std_dev_percent
   0.00   5%+110.4%   0.01   3% +48.8%   0.01  7%
+65.3%   0.01   3%  reaim.std_dev_time
  50800-2.4%  49600-1.6%  5
-1.8%  49866reaim.workload



Following the introduction of runnable_avg there came a gap between it
and util, and it can be supposedly filled up by determining the pivot
point using the imb percent. The upside is that no heuristic is added.

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8215,15 +8215,8 @@ group_has_capacity(unsigned int imbalanc
if (sgs->sum_nr_running < sgs->group_weight)
return true;
  
-	if ((sgs->group_capacity * imbalance_pct) <

-   (sgs->group_runnable * 100))
-   return false;
-
-   if ((sgs->group_capacity * 100) >
-   (sgs->group_util * imbalance_pct))
-   return true;
-
-   return false;
+   return sgs->group_capacity * imbalance_pct >
+   (sgs->group_util + sgs->group_runnable) *50;
  }
  
  /*

@@ -8240,15 +8233,8 @@ group_is_overloaded(unsigned int imbalan
if (sgs->sum

Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression

2020-06-15 Thread Xing Zhengjun




On 6/15/2020 4:10 PM, Vincent Guittot wrote:

Hi Xing,

Le lundi 15 juin 2020 à 15:26:59 (+0800), Xing Zhengjun a écrit :



On 6/12/2020 7:06 PM, Hillf Danton wrote:


On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote:


...


--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8215,12 +8215,8 @@ group_has_capacity(unsigned int imbalanc
if (sgs->sum_nr_running < sgs->group_weight)
return true;
-   if ((sgs->group_capacity * imbalance_pct) <
-   (sgs->group_runnable * 100))
-   return false;
-
-   if ((sgs->group_capacity * 100) >
-   (sgs->group_util * imbalance_pct))
+   if ((sgs->group_capacity * 100) > (sgs->group_util * imbalance_pct) &&
+   (sgs->group_capacity * 100) > (sgs->group_runnable * imbalance_pct))
return true;
return false;
@@ -8240,12 +8236,8 @@ group_is_overloaded(unsigned int imbalan
if (sgs->sum_nr_running <= sgs->group_weight)
return false;
-   if ((sgs->group_capacity * 100) <
-   (sgs->group_util * imbalance_pct))
-   return true;
-
-   if ((sgs->group_capacity * imbalance_pct) <
-   (sgs->group_runnable * 100))
+   if ((sgs->group_capacity * 100) < (sgs->group_util * imbalance_pct) ||
+   (sgs->group_capacity * 100) < (sgs->group_runnable * imbalance_pct))
return true;
return false;



I apply the patch based on v5.7, the regression still existed.


Thanks for the test. I don't know if it's relevant or not but the results seem 
a bit
better with the patch and I'd like to check that it's only a matter of 
threshold to
fix the problem.

Could you try the patch below which is quite aggressive but will help to 
confirm this ?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 28be1c984a42..3c51d557547b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8322,10 +8322,13 @@ static inline int sg_imbalanced(struct sched_group 
*group)
  static inline bool
  group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
  {
+   unsigned long imb;
+
 if (sgs->sum_nr_running < sgs->group_weight)
 return true;

-   if ((sgs->group_capacity * imbalance_pct) <
+   imb = sgs->sum_nr_running * 100;
+   if ((sgs->group_capacity * imb) <
 (sgs->group_runnable * 100))
 return false;

@@ -8347,6 +8350,8 @@ group_has_capacity(unsigned int imbalance_pct, struct 
sg_lb_stats *sgs)
  static inline bool
  group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
  {
+   unsigned long imb;
+
 if (sgs->sum_nr_running <= sgs->group_weight)
 return false;

@@ -8354,7 +8359,8 @@ group_is_overloaded(unsigned int imbalance_pct, struct 
sg_lb_stats *sgs)
 (sgs->group_util * imbalance_pct))
 return true;

-   if ((sgs->group_capacity * imbalance_pct) <
+   imb = sgs->sum_nr_running * 100;
+   if ((sgs->group_capacity * imb) <
 (sgs->group_runnable * 100))
 return true;




I apply the patch based on v5.7, the test result is as the following:

=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
  9f68395333ad7f5bfe2f83473fed363d4229f11c
  070f5e860ee2bf588c99ef7b4c202451faa48236
  v5.7
  3e1643da53f3fc7414cfa3ad2a16ab2a164b7f4d (the test patch)

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 
3e1643da53f3fc7414cfa3ad2a1
 --- --- 
---
 %stddev %change %stddev %change 
%stddev %change %stddev
 \  |\  |\ 
|\
  0.69   -10.3%   0.62-9.1%   0.62 
  -7.1%   0.64reaim.child_systime
  0.62-1.0%   0.61+0.5%   0.62 
  +1.3%   0.63reaim.child_utime
 66870   -10.0%  60187-7.6%  61787 
  -6.1%  62807reaim.jobs_per_min
 16717   -10.0%  15046-7.6%  15446 
  -6.1%  15701reaim.jobs_per_min_child
 97.84-1.1%  96.75-0.4%  97.43 
  -0.5%  97.34reaim.jti
 72000   -10.8%  64216-8.3%  66000 
  -5.7%  67885reaim.max_

Re: [LKP] [rcu] 276c410448: will-it-scale.per_thread_ops -12.3% regression

2020-06-15 Thread Xing Zhengjun

Hi Paul,

   Do you have time to take a look at this? Thanks.

On 6/15/2020 4:57 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a -12.3% regression of will-it-scale.per_thread_ops due to 
commit:


commit: 276c410448dbca357a2bc3539acfe04862e5f172 ("rcu-tasks: Split 
->trc_reader_need_end")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G 
memory
with following parameters:

nr_task: 100%
mode: thread
test: page_fault3
cpufreq_governor: performance
ucode: 0x11

test-description: Will It Scale takes a testcase and runs it from 1 through to 
n parallel copies to see if the testcase will scale. It builds both a process 
and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
-->


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

=
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
   
gcc-9/performance/x86_64-rhel-7.6/thread/100%/debian-x86_64-20191114.cgz/lkp-knm01/page_fault3/will-it-scale/0x11

commit:
   b0afa0f056 ("rcu-tasks: Provide boot parameter to delay IPIs until late in grace 
period")
   276c410448 ("rcu-tasks: Split ->trc_reader_need_end")

b0afa0f056676ffe 276c410448dbca357a2bc3539ac
 ---
fail:runs  %reproductionfail:runs
| | |
   2:4  -50%:4 
dmesg.WARNING:at#for_ip_interrupt_entry/0x
:4   28%   1:4 
perf-profile.calltrace.cycles-pp.error_entry
   0:40%   0:4 
perf-profile.children.cycles-pp.error_exit
   1:47%   2:4 
perf-profile.children.cycles-pp.error_entry
   0:44%   1:4 
perf-profile.self.cycles-pp.error_entry
  %stddev %change %stddev
  \  |\
   1414   -12.3%   1241 ±  2%  will-it-scale.per_thread_ops
 463.32+1.7% 470.99will-it-scale.time.elapsed_time
 463.32+1.7% 470.99
will-it-scale.time.elapsed_time.max
 407566   -12.3% 357573 ±  2%  will-it-scale.workload
  48.51-1.5%  47.77boot-time.boot
  7.203e+10   +20.0%   8.64e+10 ±  2%  cpuidle.C1.time
  2.162e+08 ±  2% +27.7%  2.761e+08 ±  2%  cpuidle.C1.usage
  60.50   +12.2   72.74 ±  2%  mpstat.cpu.all.idle%
  39.17   -12.2   26.97 ±  6%  mpstat.cpu.all.sys%
   2334 ± 12% +18.8%   2772 ±  5%  
slabinfo.khugepaged_mm_slot.active_objs
   2334 ± 12% +18.8%   2772 ±  5%  
slabinfo.khugepaged_mm_slot.num_objs
  60.25   +20.3%  72.50 ±  2%  vmstat.cpu.id
  92.75 ±  3% -21.6%  72.75 ±  5%  vmstat.procs.r
 223709   +41.8% 317250 ±  3%  vmstat.system.cs
 641687 ±  3%  +8.0% 693245 ±  2%  proc-vmstat.nr_inactive_anon
 641688 ±  3%  +8.0% 693245 ±  2%  proc-vmstat.nr_zone_inactive_anon
 166782-3.7% 160632proc-vmstat.numa_hint_faults
 166782-3.7% 160632
proc-vmstat.numa_hint_faults_local
 984.25   -14.2% 844.75 ±  2%  proc-vmstat.numa_huge_pte_updates
 710979   -11.2% 631134proc-vmstat.numa_pte_updates
  1.967e+08   -10.9%  1.752e+08proc-vmstat.pgfault
  58.18+3.4%  60.17perf-stat.i.MPKI
  1.173e+09   +10.5%  1.296e+09perf-stat.i.branch-instructions
   6.74-0.16.68perf-stat.i.branch-miss-rate%
   72495831   +10.7%   80219684perf-stat.i.branch-misses
  14.68-0.6   14.06perf-stat.i.cache-miss-rate%
   43014696   +10.5%   47551690perf-stat.i.cache-misses
  2.936e+08   +15.6%  3.393e+08perf-stat.i.cache-references
 227441   +42.0% 323034 ±  3%  perf-stat.i.context-switches
  37.22   -29.0%  26.44 ±  5%  perf-stat.i.cpi
  1.828e+11   -22.3%  1.421e+11 ±  3%  perf-stat.i.cpu-cycles
 513.71   +13.6% 583.63perf-stat.i.cpu-migrations
   4303   -27.8%   3107 ±  5%  
perf-stat.i.cycles-between-cache-misses
   1.78-0.01.74

Re: [LKP] [sched/fair] 6c8116c914: stress-ng.mmapfork.ops_per_sec -38.0% regression

2020-06-15 Thread Xing Zhengjun




On 6/15/2020 1:18 PM, Tao Zhou wrote:

Hi,

On Fri, Jun 12, 2020 at 03:59:31PM +0800, Xing Zhengjun wrote:

Hi,

I test the regression, it still existed in v5.7.  If you have any fix
for it, please send it to me, I can verify it. Thanks.


When busiest group is group_fully_busy and local group <= group_fully_busy
the metric used:

   local group  busiest group  use metric
group_fully_busy   group_fully_busy avg load
group_has_sparegroup_fully_busy idle cpu/task num

In find_busiest_group() about this condition:

 'if (busiest->group_type != group_overloaded) {'

in this case, busiest type is group_fully_busy, local type <= group_fully_busy.
in this branch, it check idle cpu and task num and can go to out_balance. That
is to say ignore group_fully_busy other than group_has_spare(this case is done
in calculate_imbalance()).

When local group and busiest group are all group_fully_busy, need to use avg
load to metric(in calculate_imbalance()). So give the below change:


diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cbcb2f71599b..0afbea39dd5a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9219,24 +9219,26 @@ static struct sched_group *find_busiest_group(struct 
lb_env *env)
  */
 goto out_balanced;
  
-   if (busiest->group_weight > 1 &&

-   local->idle_cpus <= (busiest->idle_cpus + 1))
-   /*
-* If the busiest group is not overloaded
-* and there is no imbalance between this and busiest
-* group wrt idle CPUs, it is balanced. The imbalance
-* becomes significant if the diff is greater than 1
-* otherwise we might end up to just move the imbalance
-* on another group. Of course this applies only if
-* on another group. Of course this applies only if
-* there is more than 1 CPU per group.
-*/
-   goto out_balanced;
+   if (local->group_type == group_has_spare) {
+   if (busiest->group_weight > 1 &&
+   local->idle_cpus <= (busiest->idle_cpus + 1))
+   /*
+* If the busiest group is not overloaded
+* and there is no imbalance between this and 
busiest
+* group wrt idle CPUs, it is balanced. The 
imbalance
+* becomes significant if the diff is greater 
than 1
+* otherwise we might end up to just move the 
imbalance
+* on another group. Of course this applies 
only if
+* there is more than 1 CPU per group.
+*/
+   goto out_balanced;
  
-   if (busiest->sum_h_nr_running == 1)

-   /*
-* busiest doesn't have any tasks waiting to run
-*/
-   goto out_balanced;
+   if (busiest->sum_h_nr_running == 1)
+   /*
+* busiest doesn't have any tasks waiting to run
+*/
+   goto out_balanced;
+   }
 }
  
  force_balance:


In fact, I don't know this change can help or not, can be right or not.
No test, no compile. If it is wrong, just ignore.

Thanks


I apply the patch based on v5.7, the regression still existed.
=
tbox_group/testcase/rootfs/kconfig/compiler/nr_threads/disk/sc_pid_max/testtime/class/cpufreq_governor/ucode:

lkp-bdw-ep6/stress-ng/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/100%/1HDD/4194304/1s/scheduler/performance/0xb38

commit:
  e94f80f6c49020008e6fa0f3d4b806b8595d17d8
  6c8116c914b65be5e4d6f66d69c8142eb0648c22
  v5.7
  c7e6d37f60da32f808140b1b7dabcc3cde73c4cc  (Tao's patch)

e94f80f6c4902000 6c8116c914b65be5e4d6f66d69cv5.7 
c7e6d37f60da32f808140b1b7da
 --- --- 
---
 %stddev %change %stddev %change 
%stddev %change %stddev
 \  |\  |\ 
|\
819250 ±  5% -10.1% 736616 ±  8% +41.2%1156877 ± 
3% +43.6%1176246 ±  5%  stress-ng.futex.ops
818985 ±  5% -10.1% 736460 ±  8% +41.2%1156215 ± 
3% +43.6%1176055 ±  5%  st

Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression

2020-06-15 Thread Xing Zhengjun




On 6/12/2020 11:19 PM, Vincent Guittot wrote:

Le vendredi 12 juin 2020 à 14:36:49 (+0800), Xing Zhengjun a écrit :

Hi Vincent,

   We test the regression still existed in v5.7, do you have time to look at
it? Thanks.




The commit 070f5e860ee2 moveis some cases from the state "group has spare 
capacity" to
to the state "group is overloaded". Typically when util_avg decreases 
significantly
after a migration but the group is in fact still overloaded.
The current rule uses a fix threshold but has the disavantage of possibly 
including
some cases with spare capacity but a high runnable_avg (because of tasks running
simultaneously as an example).
It looks like this benchmark is impacted by moving such cases from 
has_spare_capacity
to is_overloaded. I have a patch in my backlog  that tries to fix the problem 
but I
never sent it because I failed to find a benchmark that will benefit from it.

This patch moves back some cases from overloaded state to "has spare capacity" 
state.

Could you make it a try ?

---
  kernel/sched/fair.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0ed04d2a8959..c24f85969591 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8213,10 +8213,14 @@ static inline int sg_imbalanced(struct sched_group 
*group)
  static inline bool
  group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
  {
+   unsigned long imb;
+
if (sgs->sum_nr_running < sgs->group_weight)
return true;

-   if ((sgs->group_capacity * imbalance_pct) <
+   imb = imbalance_pct-100;
+   imb = sgs->sum_nr_running * imb + 100;
+   if ((sgs->group_capacity * imb) <
(sgs->group_runnable * 100))
return false;

@@ -8238,6 +8242,8 @@ group_has_capacity(unsigned int imbalance_pct, struct 
sg_lb_stats *sgs)
  static inline bool
  group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
  {
+   unsigned long imb;
+
if (sgs->sum_nr_running <= sgs->group_weight)
return false;

@@ -8245,7 +8251,9 @@ group_is_overloaded(unsigned int imbalance_pct, struct 
sg_lb_stats *sgs)
(sgs->group_util * imbalance_pct))
return true;

-   if ((sgs->group_capacity * imbalance_pct) <
+   imb = imbalance_pct-100;
+   imb = sgs->sum_nr_running * imb + 100;
+   if ((sgs->group_capacity * imb) <
(sgs->group_runnable * 100))
return true;

--
2.17.1





I apply the patch based on v5.7, the regression still existed.

=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
  9f68395333ad7f5bfe2f83473fed363d4229f11c
  070f5e860ee2bf588c99ef7b4c202451faa48236
  v5.7
  068638639cdfa15dbff137a0e3ef4a4cc6730ff4 (Vincent's patch)

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 
068638639cdfa15dbff137a0e3e
 --- --- 
---
 %stddev %change %stddev %change 
%stddev %change %stddev
 \  |\  |\ 
|\
  0.69   -10.3%   0.62-9.1%   0.62 
  -8.9%   0.63reaim.child_systime
  0.62-1.0%   0.61+0.5%   0.62 
  +0.6%   0.62reaim.child_utime
 66870   -10.0%  60187-7.6%  61787 
  -7.7%  61714reaim.jobs_per_min
 16717   -10.0%  15046-7.6%  15446 
  -7.7%  15428reaim.jobs_per_min_child
 97.84-1.1%  96.75-0.4%  97.43 
  -0.6%  97.25reaim.jti
 72000   -10.8%  64216-8.3%  66000 
  -8.3%  66000reaim.max_jobs_per_min
  0.36   +10.6%   0.40+7.8%   0.39 
  +8.2%   0.39reaim.parent_time
  1.58 ±  2% +71.0%   2.70 ±  2% +26.9%   2.01 ± 
2% +38.4%   2.19 ±  6%  reaim.std_dev_percent
  0.00 ±  5%+110.4%   0.01 ±  3% +48.8%   0.01 ± 
7% +67.1%   0.01 ±  9%  reaim.std_dev_time
 50800-2.4%  49600-1.6%  5 
  -1.6%  5reaim.workload






=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/de

Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression

2020-06-15 Thread Xing Zhengjun




On 6/12/2020 7:06 PM, Hillf Danton wrote:


On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote:

Hi Vincent,

We test the regression still existed in v5.7, do you have time to
look at it? Thanks.

  
=

tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:
  
lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21


commit:
9f68395333ad7f5bfe2f83473fed363d4229f11c
070f5e860ee2bf588c99ef7b4c202451faa48236
v5.7

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7
 --- ---
   %stddev %change %stddev %change %stddev
   \  |\  |\
0.69   -10.3%   0.62-9.1%   0.62 
reaim.child_systime
0.62-1.0%   0.61+0.5%   0.62 
reaim.child_utime
   66870   -10.0%  60187-7.6%  61787 
reaim.jobs_per_min
   16717   -10.0%  15046-7.6%  15446 
reaim.jobs_per_min_child
   97.84-1.1%  96.75-0.4%  97.43 
reaim.jti
   72000   -10.8%  64216-8.3%  66000 
reaim.max_jobs_per_min
0.36   +10.6%   0.40+7.8%   0.39 
reaim.parent_time
1.58 ą  2% +71.0%   2.70 ą  2% +26.9%   2.01 ą  2%  
reaim.std_dev_percent
0.00 ą  5%+110.4%   0.01 ą  3% +48.8%   0.01 ą  7%  
reaim.std_dev_time
   50800-2.4%  49600-1.6%  5 
reaim.workload


On 3/19/2020 10:38 AM, kernel test robot wrote:

Greeting,

FYI, we noticed a -10.5% regression of reaim.jobs_per_min due to commit:


commit: 070f5e860ee2bf588c99ef7b4c202451faa48236 ("sched/fair: Take into account 
runnable_avg to classify group")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: reaim
on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G 
memory
with following parameters:

runtime: 300s
nr_task: 100%
test: five_sec
cpufreq_governor: performance
ucode: 0x21

test-description: REAIM is an updated and improved version of AIM 7 benchmark.
test-url: https://sourceforge.net/projects/re-aim-7/


Hi Xing

After 070f5e860ee2 let's treat runnable the same way as util on
comparing capacity in the assumption that
(125 + 110 + 117) / 3 = 117 accounts for 105 within margin of error
before any other proposal with some more reasons.

thanks
Hillf
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8215,12 +8215,8 @@ group_has_capacity(unsigned int imbalanc
if (sgs->sum_nr_running < sgs->group_weight)
return true;
  
-	if ((sgs->group_capacity * imbalance_pct) <

-   (sgs->group_runnable * 100))
-   return false;
-
-   if ((sgs->group_capacity * 100) >
-   (sgs->group_util * imbalance_pct))
+   if ((sgs->group_capacity * 100) > (sgs->group_util * imbalance_pct) &&
+   (sgs->group_capacity * 100) > (sgs->group_runnable * imbalance_pct))
return true;
  
  	return false;

@@ -8240,12 +8236,8 @@ group_is_overloaded(unsigned int imbalan
if (sgs->sum_nr_running <= sgs->group_weight)
return false;
  
-	if ((sgs->group_capacity * 100) <

-   (sgs->group_util * imbalance_pct))
-   return true;
-
-   if ((sgs->group_capacity * imbalance_pct) <
-   (sgs->group_runnable * 100))
+   if ((sgs->group_capacity * 100) < (sgs->group_util * imbalance_pct) ||
+   (sgs->group_capacity * 100) < (sgs->group_runnable * imbalance_pct))
return true;
  
  	return false;




I apply the patch based on v5.7, the regression still existed.

=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
  9f68395333ad7f5bfe2f83473fed363d4229f11c
  070f5e860ee2bf588c99ef7b4c202451faa48236
  v5.7
  6b33257768b8dd3982054885ea310871be2cfe0b (Hillf's patch)

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 
6b33257768b8dd3982054885ea3
 --- --- 
---
 %stddev %change %stddev %change 
%stddev %change %stddev
 

Re: [LKP] [btrfs] c75e839414: aim7.jobs-per-min -9.1% regression

2020-06-14 Thread Xing Zhengjun

Hi Josef,

   Do you have time to take a look at this? Thanks.

On 6/12/2020 2:11 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a -9.1% regression of aim7.jobs-per-min due to commit:


commit: c75e839414d3610e6487ae3145199c500d55f7f7 ("btrfs: kill the subvol_srcu")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: aim7
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 
192G memory
with following parameters:

disk: 4BRD_12G
md: RAID0
fs: btrfs
test: disk_wrt
load: 1500
cpufreq_governor: performance
ucode: 0x52c

test-description: AIM7 is a traditional UNIX system level benchmark suite which 
is used to test and measure the performance of multiuser system.
test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
-->


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

=
compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase/ucode:
   
gcc-7/performance/4BRD_12G/btrfs/x86_64-rhel-7.6/1500/RAID0/debian-x86_64-20191114.cgz/lkp-csl-2ap2/disk_wrt/aim7/0x52c

commit:
   efc3453494 ("btrfs: make btrfs_cleanup_fs_roots use the radix tree lock")
   c75e839414 ("btrfs: kill the subvol_srcu")

efc3453494af7818 c75e839414d3610e6487ae31451
 ---
fail:runs  %reproductionfail:runs
| | |
   3:9  -33%:8 
dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x
  %stddev %change %stddev
  \  |\
  29509 ±  2%  -9.1%  26837 ±  2%  aim7.jobs-per-min
 305.28 ±  2% +10.0% 335.72 ±  2%  aim7.time.elapsed_time
 305.28 ±  2% +10.0% 335.72 ±  2%  aim7.time.elapsed_time.max
4883135 ± 10% +37.9%6735464 ±  7%  
aim7.time.involuntary_context_switches
  56288 ±  2% +10.5%  62202 ±  2%  aim7.time.system_time
2344783+6.5%2497364 ±  2%  
aim7.time.voluntary_context_switches
   62337721 ±  2%  +9.8%   68456490 ±  2%  turbostat.IRQ
 431.56 ±  6% +22.3% 527.88 ±  4%  vmstat.procs.r
  27340 ±  2% +11.2%  30397 ±  2%  vmstat.system.cs
 226804 ±  6% +21.7% 276057 ±  4%  meminfo.Active(file)
 221309 ±  6% +22.3% 270668 ±  4%  meminfo.Dirty
 720.89 ±111% +49.3%   1076 ± 73%  meminfo.Mlocked
  14278 ±  2%  -8.3%  13094 ±  2%  meminfo.max_used_kB
  57228 ±  6% +22.7%  70195 ±  5%  numa-meminfo.node0.Active(file)
  55433 ±  6% +21.6%  67431 ±  4%  numa-meminfo.node0.Dirty
  56152 ±  6% +21.4%  68180 ±  5%  numa-meminfo.node1.Active(file)
  55001 ±  6% +22.5%  67397 ±  4%  numa-meminfo.node1.Dirty
  56373 ±  6% +21.7%  68594 ±  4%  numa-meminfo.node2.Active(file)
  55222 ±  7% +22.6%  67726 ±  4%  numa-meminfo.node2.Dirty
  56671 ±  6% +20.5%  68317 ±  3%  numa-meminfo.node3.Active(file)
  55285 ±  6% +21.8%  67355 ±  4%  numa-meminfo.node3.Dirty
  56694 ±  6% +21.7%  69019 ±  4%  proc-vmstat.nr_active_file
  55342 ±  6% +22.3%  67662 ±  4%  proc-vmstat.nr_dirty
 402316+2.1% 410951proc-vmstat.nr_file_pages
 180.22 ±111% +49.4% 269.25 ± 73%  proc-vmstat.nr_mlock
  56694 ±  6% +21.7%  69019 ±  4%  proc-vmstat.nr_zone_active_file
  54680 ±  6% +22.8%  67168 ±  4%  proc-vmstat.nr_zone_write_pending
3144381 ±  2%  +6.1%3335275proc-vmstat.pgactivate
1387558 ±  2%  +7.9%1496754 ±  2%  proc-vmstat.pgfault
 983.33 ±  4%  +5.4%   1036
proc-vmstat.unevictable_pgs_culled
  14331 ±  6% +22.6%  17566 ±  5%  numa-vmstat.node0.nr_active_file
  13884 ±  6% +21.6%  16884 ±  4%  numa-vmstat.node0.nr_dirty
  14330 ±  6% +22.6%  17566 ±  5%  
numa-vmstat.node0.nr_zone_active_file
  13714 ±  6% +22.2%  16755 ±  4%  
numa-vmstat.node0.nr_zone_write_pending
  14047 ±  6% +21.3%  17043 ±  4%  numa-vmstat.node1.nr_active_file
  13763 ±  6% +22.3%  16838 ±  4%  numa-vmstat.node1.nr_dirty
  14047 ±  6% +21.3%  17043 ±  4%  
numa-vmstat.node1.nr_zone_active_file
  13599 ±  6% +23.0%  16726 ±  4%  
numa-vmstat.node1.nr_zone_write_pending
  14074 ±  5% +21.7%  17130 ±  4%  

Re: [LKP] [x86, sched] 1567c3e346: vm-scalability.median -15.8% regression

2020-06-12 Thread Xing Zhengjun

Hi Giovanni,

   I test the regression, it still existed in v5.7.  Do you have time 
to take a look at this? Thanks.


=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/debug-setup/size/test/cpufreq_governor/ucode:

lkp-hsw-4ex1/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/test/8T/anon-cow-seq/performance/0x16

commit:
  2a4b03ffc69f2dedc6388e9a6438b5f4c133a40d
  1567c3e3467cddeb019a7b53ec632f834b6a9239
  v5.7-rc1
  v5.7

2a4b03ffc69f2ded 1567c3e3467cddeb019a7b53ec6v5.7-rc1 
   v5.7
 --- --- 
---
 %stddev %change %stddev %change 
%stddev %change %stddev
 \  |\  |\ 
|\
211462   -16.0% 177702   -15.0% 179809 
 -15.1% 179510vm-scalability.median
  5.34 ±  9%  -3.12.23 ± 11%  -2.92.49 ± 
5%  -2.72.61 ± 11%  vm-scalability.median_stddev%
  30430671   -16.3%   25461360   -15.5%   25707029 
 -15.5%   25701713vm-scalability.throughput
 7.967e+09   -11.1%  7.082e+09   -11.1%  7.082e+09 
 -11.1%  7.082e+09vm-scalability.workload




On 4/16/2020 2:20 PM, Giovanni Gherdovich wrote:

On Thu, 2020-04-16 at 14:10 +0800, Xing Zhengjun wrote:

Hi Giovanni,

1567c3e346("x86, sched: Add support for frequency invariance") has
been merged into Linux mainline v5.7-rc1 now. Do you have time to take a
look at this? Thanks.



Apologies, this slipped under my radar. I'm on it, thanks.


Giovanni Gherdovich



--
Zhengjun Xing


Re: [LKP] [sched/fair] 6c8116c914: stress-ng.mmapfork.ops_per_sec -38.0% regression

2020-06-12 Thread Xing Zhengjun

Hi,

  I test the regression, it still existed in v5.7.  If you have any fix 
for it, please send it to me, I can verify it. Thanks.


=
tbox_group/testcase/rootfs/kconfig/compiler/nr_threads/disk/sc_pid_max/testtime/class/cpufreq_governor/ucode:

lkp-bdw-ep6/stress-ng/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/100%/1HDD/4194304/1s/scheduler/performance/0xb38

commit:
  e94f80f6c49020008e6fa0f3d4b806b8595d17d8
  6c8116c914b65be5e4d6f66d69c8142eb0648c22
  v5.7-rc3
  v5.7

e94f80f6c4902000 6c8116c914b65be5e4d6f66d69cv5.7-rc3 
   v5.7
 --- --- 
---
 %stddev %change %stddev %change 
%stddev %change %stddev
 \  |\  |\ 
|\
 21398 ±  7%  +6.5%  22781 ±  2% -14.5%  18287 ± 
4%  -5.5%  20231 ± 14%  stress-ng.clone.ops
819250 ±  5% -10.1% 736616 ±  8% +34.2%1099410 ± 
5% +41.2%1156877 ±  3%  stress-ng.futex.ops
818985 ±  5% -10.1% 736460 ±  8% +34.2%1099487 ± 
5% +41.2%1156215 ±  3%  stress-ng.futex.ops_per_sec
  1551 ±  3%  -3.4%   1498 ±  5%  -9.5%   1404 ± 
2%  -4.6%   1480 ±  5%  stress-ng.inotify.ops
  1547 ±  3%  -3.5%   1492 ±  5%  -9.5%   1400 ± 
2%  -4.8%   1472 ±  5%  stress-ng.inotify.ops_per_sec
 11292 ±  8%  -2.8%  10974 ±  8%  +1.9%  11505 ± 
13%  -9.4%  10225 ±  6%  stress-ng.kill.ops
 28.20 ±  4% -35.4%  18.22   -33.5%  18.75 
 -33.4%  18.77stress-ng.mmapfork.ops_per_sec
   1932318+1.5%1961688 ±  2% -22.8%1492231 ± 
2%  +4.0%2010509 ±  3%  stress-ng.softlockup.ops
   1931679 ±  2%  +1.5%1961143 ±  2% -22.8%1491939 ± 
2%  +4.0%2009585 ±  3%  stress-ng.softlockup.ops_per_sec
  18607406 ±  6% -12.9%   16210450 ± 21% -12.7%   16238693 ± 
14%  -8.0%   17120880 ± 13%  stress-ng.switch.ops
  18604406 ±  6% -12.9%   16208270 ± 21% -12.7%   16237956 ± 
14%  -8.0%   17115273 ± 13%  stress-ng.switch.ops_per_sec
   2999012 ± 21% -10.1%2696954 ± 22%  -9.1%2725653 ± 
21% -88.5% 37 ± 11%  stress-ng.tee.ops_per_sec
  7882 ±  3%  -5.4%   7458 ±  4%  -4.0%   7566 ± 
4%  -2.0%   7724 ±  3%  stress-ng.vforkmany.ops
  7804 ±  3%  -5.2%   7400 ±  4%  -3.8%   7504 ± 
4%  -2.0%   7647 ±  3%  stress-ng.vforkmany.ops_per_sec
  46745421 ±  3%  -8.1%   42938569 ±  3%  -7.8%   43078233 ± 
3%  -5.2%   44312072 ±  4%  stress-ng.yield.ops
  46734472 ±  3%  -8.1%   42926316 ±  3%  -7.8%   43067447 ± 
3%  -5.2%   44290338 ±  4%  stress-ng.yield.ops_per_sec



On 4/27/2020 8:46 PM, Vincent Guittot wrote:

On Mon, 27 Apr 2020 at 13:35, Hillf Danton  wrote:



On Mon, 27 Apr 2020 11:03:58 +0200 Vincent Guittot wrote: 

On Sun, 26 Apr 2020 at 14:42, Hillf Danton wrote:


On 4/21/2020 8:47 AM, kernel test robot wrote:


Greeting,

FYI, we noticed a 56.4% improvement of stress-ng.fifo.ops_per_sec due to commit:


commit: 6c8116c914b65be5e4d6f66d69c8142eb0648c22 ("sched/fair: Fix condition of 
avg_load calculation")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: stress-ng
on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G 
memory
with following parameters:

 nr_threads: 100%
 disk: 1HDD
 testtime: 1s
 class: scheduler
 cpufreq_governor: performance
 ucode: 0xb38
 sc_pid_max: 4194304



We need to handle group_fully_busy in a different way from
group_overloaded as task push does not help grow load balance
in the former case.


Have you tested this patch for the UC above ? Do you have figures ?


No I am looking for a box of 88 threads. Likely to get access to it in
as early as three weeks.


--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8744,30 +8744,20 @@ find_idlest_group(struct sched_domain *s

 switch (local_sgs.group_type) {
 case group_overloaded:
-   case group_fully_busy:
-   /*
-* When comparing groups across NUMA domains, it's possible for
-* the local domain to be very lightly loaded relative to the
-* remote domains but "imbalance" skews the comparison making
-* remote CPUs look much more favourable. When considering
-* cross-domain, add imbalance to the load on the remote node
-* and consider staying local.
-*/
-
-   if ((sd->flags & SD_NUMA) &&
-   ((idlest_sgs.avg_load + imbalance) >= local_sgs.avg_load))
+   

Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression

2020-06-12 Thread Xing Zhengjun

Hi Vincent,

  We test the regression still existed in v5.7, do you have time to 
look at it? Thanks.



=
tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:

lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21

commit:
  9f68395333ad7f5bfe2f83473fed363d4229f11c
  070f5e860ee2bf588c99ef7b4c202451faa48236
  v5.7

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7
 --- ---
 %stddev %change %stddev %change %stddev
 \  |\  |\
  0.69   -10.3%   0.62-9.1%   0.62 
  reaim.child_systime
  0.62-1.0%   0.61+0.5%   0.62 
  reaim.child_utime
 66870   -10.0%  60187-7.6%  61787 
  reaim.jobs_per_min
 16717   -10.0%  15046-7.6%  15446 
  reaim.jobs_per_min_child
 97.84-1.1%  96.75-0.4%  97.43 
  reaim.jti
 72000   -10.8%  64216-8.3%  66000 
  reaim.max_jobs_per_min
  0.36   +10.6%   0.40+7.8%   0.39 
  reaim.parent_time
  1.58 ±  2% +71.0%   2.70 ±  2% +26.9%   2.01 ± 
2%  reaim.std_dev_percent
  0.00 ±  5%+110.4%   0.01 ±  3% +48.8%   0.01 ± 
7%  reaim.std_dev_time
 50800-2.4%  49600-1.6%  5 
  reaim.workload



On 3/19/2020 10:38 AM, kernel test robot wrote:

Greeting,

FYI, we noticed a -10.5% regression of reaim.jobs_per_min due to commit:


commit: 070f5e860ee2bf588c99ef7b4c202451faa48236 ("sched/fair: Take into account 
runnable_avg to classify group")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: reaim
on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G 
memory
with following parameters:

runtime: 300s
nr_task: 100%
test: five_sec
cpufreq_governor: performance
ucode: 0x21

test-description: REAIM is an updated and improved version of AIM 7 benchmark.
test-url: https://sourceforge.net/projects/re-aim-7/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
-->


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

=
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
   
gcc-7/performance/x86_64-rhel-7.6/100%/debian-x86_64-20191114.cgz/300s/lkp-ivb-d04/five_sec/reaim/0x21

commit:
   9f68395333 ("sched/pelt: Add a new runnable average signal")
   070f5e860e ("sched/fair: Take into account runnable_avg to classify group")

9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2
 ---
fail:runs  %reproductionfail:runs
| | |
   4:4  -18%   3:4 
perf-profile.children.cycles-pp.error_entry
   3:4  -12%   3:4 
perf-profile.self.cycles-pp.error_entry
  %stddev %change %stddev
  \  |\
   0.68   -10.4%   0.61reaim.child_systime
  67235   -10.5%  60195reaim.jobs_per_min
  16808   -10.5%  15048reaim.jobs_per_min_child
  97.90-1.2%  96.70reaim.jti
  72000   -10.8%  64216reaim.max_jobs_per_min
   0.36   +11.3%   0.40reaim.parent_time
   1.56 ±  3% +79.1%   2.80 ±  6%  reaim.std_dev_percent
   0.00 ±  7%+145.9%   0.01 ±  9%  reaim.std_dev_time
 104276   -16.0%  87616
reaim.time.involuntary_context_switches
   15511157-2.4%   15144312reaim.time.minor_page_faults
  55.00-7.3%  51.00
reaim.time.percent_of_cpu_this_job_got
  88.01   -12.4%  77.12reaim.time.system_time
  79.97-3.2%  77.38reaim.time.user_time
 216380-3.4% 208924
reaim.time.voluntary_context_switches
  50800-2.4%  49600reaim.workload
  30.40 ±  2%  -4.7%  28.97 ±  2%  boot-time.boot
   9.38-0.78.66 ±  3%  mpstat.cpu.all.sys%
   7452+7.5%   8014vmstat.system.cs
1457802 ± 16% +49.3%  

Re: [LKP] [ima] 8eb613c0b8: stress-ng.icache.ops_per_sec -84.2% regression

2020-06-11 Thread Xing Zhengjun




On 6/11/2020 6:53 PM, Mimi Zohar wrote:

On Thu, 2020-06-11 at 15:10 +0800, Xing Zhengjun wrote:

On 6/10/2020 9:53 PM, Mimi Zohar wrote:
ucode: 0x52c


Does the following change resolve it?

diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index c44414a7f82e..78e1dfc8a3f2 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -426,7 +426,8 @@ int ima_file_mprotect(struct vm_area_struct *vma, unsigned 
long prot)
int pcr;
   
   	/* Is mprotect making an mmap'ed file executable? */

-   if (!vma->vm_file || !(prot & PROT_EXEC) || (vma->vm_flags & VM_EXEC))
+   if (!(ima_policy_flag & IMA_APPRAISE) || !vma->vm_file ||
+   !(prot & PROT_EXEC) || (vma->vm_flags & VM_EXEC))
return 0;
   
   	security_task_getsecid(current, );



Thanks. I test the change, it can resolve the regression.


Thanks!  Can I get your "Tested-by" tag?

Mimi



Sure.

--
Zhengjun Xing


Re: [LKP] [ima] 8eb613c0b8: stress-ng.icache.ops_per_sec -84.2% regression

2020-06-11 Thread Xing Zhengjun




On 6/10/2020 9:53 PM, Mimi Zohar wrote:

Hi Xing,

On Wed, 2020-06-10 at 11:21 +0800, Xing Zhengjun wrote:

Hi Mimi,

  Do you have time to take a look at this? we noticed a 3.7%
regression of boot-time.dhcp and a 84.2% regression of
stress-ng.icache.ops_per_sec. Thanks.

On 6/3/2020 5:11 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a 3.7% regression of boot-time.dhcp due to commit:


commit: 8eb613c0b8f19627ba1846dcf78bb2c85edbe8dd ("ima: verify mprotect change is 
consistent with mmap policy")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G 
memory
with following parameters:

nr_threads: 100%
disk: 1HDD
testtime: 30s
class: cpu-cache
cpufreq_governor: performance
ucode: 0x52c


Does the following change resolve it?

diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index c44414a7f82e..78e1dfc8a3f2 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -426,7 +426,8 @@ int ima_file_mprotect(struct vm_area_struct *vma, unsigned 
long prot)
int pcr;
  
  	/* Is mprotect making an mmap'ed file executable? */

-   if (!vma->vm_file || !(prot & PROT_EXEC) || (vma->vm_flags & VM_EXEC))
+   if (!(ima_policy_flag & IMA_APPRAISE) || !vma->vm_file ||
+   !(prot & PROT_EXEC) || (vma->vm_flags & VM_EXEC))
return 0;
  
  	security_task_getsecid(current, );



Thanks. I test the change, it can resolve the regression.
=
tbox_group/testcase/rootfs/kconfig/compiler/debug-setup/nr_threads/disk/testtime/class/cpufreq_governor/ucode:

lkp-csl-2sp5/stress-ng/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-9/test/100%/1HDD/30s/cpu-cache/performance/0x52c

commit:
  0c4395fb2aa77341269ea619c5419ea48171883f
  8eb613c0b8f19627ba1846dcf78bb2c85edbe8dd
  8745d6eb3a493b1d324eeb9edefec5d23c16cba9 (fix for the regression)

0c4395fb2aa77341 8eb613c0b8f19627ba1846dcf78 8745d6eb3a493b1d324eeb9edef
 --- ---
 %stddev %change %stddev %change %stddev
 \  |\  |\
884.33 ±  4%  +4.6% 924.67   +45.1%   1283 ± 
3%  stress-ng.cache.ops
 29.47 ±  4%  +4.6%  30.82   +45.1%  42.76 ± 
3%  stress-ng.cache.ops_per_sec
   1245720   -84.3% 195648-0.8%1235416 
  stress-ng.icache.ops
 41522   -84.3%   6520-0.8%  41179 
  stress-ng.icache.ops_per_sec




--
Zhengjun Xing


Re: [LKP] [xfs] a5949d3fae: aim7.jobs-per-min -33.6% regression

2020-06-09 Thread Xing Zhengjun

Hi Darrick,

   Do you have time to take a look at this? Thanks.

On 6/6/2020 11:48 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a -33.6% regression of aim7.jobs-per-min due to commit:


commit: a5949d3faedf492fa7863b914da408047ab46eb0 ("xfs: force writes to delalloc 
regions to unwritten")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: aim7
on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G 
memory
with following parameters:

disk: 1BRD_48G
fs: xfs
test: sync_disk_rw
load: 600
cpufreq_governor: performance
ucode: 0x42e

test-description: AIM7 is a traditional UNIX system level benchmark suite which 
is used to test and measure the performance of multiuser system.
test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
-->


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

=
compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase/ucode:
   
gcc-9/performance/1BRD_48G/xfs/x86_64-rhel-7.6/600/debian-x86_64-20191114.cgz/lkp-ivb-2ep1/sync_disk_rw/aim7/0x42e

commit:
   590b16516e ("xfs: refactor xfs_iomap_prealloc_size")
   a5949d3fae ("xfs: force writes to delalloc regions to unwritten")

590b16516ef38e2e a5949d3faedf492fa7863b914da
 ---
fail:runs  %reproductionfail:runs
| | |
:4   50%   2:4 
dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x
  %stddev %change %stddev
  \  |\
  35272   -33.6%  23430aim7.jobs-per-min
 102.13   +50.5% 153.75aim7.time.elapsed_time
 102.13   +50.5% 153.75aim7.time.elapsed_time.max
1388038   +40.2%1945838
aim7.time.involuntary_context_switches
  43420 ±  2% +13.4%  49255 ±  2%  aim7.time.minor_page_faults
   3123   +44.2%   4504 ±  2%  aim7.time.system_time
  59.31+6.5%  63.18aim7.time.user_time
   48595108   +58.6%   77064959
aim7.time.voluntary_context_switches
   1.44   -28.8%   1.02iostat.cpu.user
   0.07 ±  6%  +0.40.44 ±  7%  mpstat.cpu.all.iowait%
   1.44-0.41.02mpstat.cpu.all.usr%
   8632 ± 50% +75.6%  15156 ± 34%  numa-meminfo.node0.KernelStack
   6583 ±136%+106.0%  13562 ± 82%  numa-meminfo.node0.PageTables
  63325 ± 11% +14.3%  72352 ± 12%  numa-meminfo.node0.SUnreclaim
   8647 ± 50% +75.3%  15156 ± 34%  numa-vmstat.node0.nr_kernel_stack
   1656 ±136%+104.6%   3389 ± 82%  
numa-vmstat.node0.nr_page_table_pages
  15831 ± 11% +14.3%  18087 ± 12%  
numa-vmstat.node0.nr_slab_unreclaimable
  93640 ±  3% +41.2% 132211 ±  2%  meminfo.AnonHugePages
  21641   +39.9%  30271 ±  4%  meminfo.KernelStack
 129269   +12.3% 145114meminfo.SUnreclaim
  28000   -31.2%  19275meminfo.max_used_kB
1269307   -26.9% 927657vmstat.io.bo
 149.75 ±  3% -17.4% 123.75 ±  4%  vmstat.procs.r
 718992   +13.3% 814567vmstat.system.cs
 231397-9.3% 209881 ±  2%  vmstat.system.in
  6.774e+08   +70.0%  1.152e+09cpuidle.C1.time
   18203372   +60.4%   29198744cpuidle.C1.usage
  2.569e+08 ± 18% +81.8%  4.672e+08 ±  5%  cpuidle.C1E.time
2691402 ± 13% +98.7%5346901 ±  3%  cpuidle.C1E.usage
 990350   +95.0%1931226 ±  2%  cpuidle.POLL.time
 520061   +97.7%1028004 ±  2%  cpuidle.POLL.usage
  77231+1.8%  78602proc-vmstat.nr_active_anon
  19868+3.8%  20615proc-vmstat.nr_dirty
 381302+1.0% 384969proc-vmstat.nr_file_pages
   4388-2.7%   4270proc-vmstat.nr_inactive_anon
  69865+4.7%  73155proc-vmstat.nr_inactive_file
  21615   +40.0%  30251 ±  4%  proc-vmstat.nr_kernel_stack
   7363-3.2%   7127proc-vmstat.nr_mapped
  12595 ±  3%  +5.2%  13255 ±  4%  proc-vmstat.nr_shmem
  19619+3.2%  20247proc-vmstat.nr_slab_reclaimable
  32316   +12.3%  36280

Re: [LKP] [ima] 8eb613c0b8: stress-ng.icache.ops_per_sec -84.2% regression

2020-06-09 Thread Xing Zhengjun

Hi Mimi,

Do you have time to take a look at this? we noticed a 3.7% 
regression of boot-time.dhcp and a 84.2% regression of 
stress-ng.icache.ops_per_sec. Thanks.


On 6/3/2020 5:11 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a 3.7% regression of boot-time.dhcp due to commit:


commit: 8eb613c0b8f19627ba1846dcf78bb2c85edbe8dd ("ima: verify mprotect change is 
consistent with mmap policy")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G 
memory
with following parameters:

nr_threads: 100%
disk: 1HDD
testtime: 30s
class: cpu-cache
cpufreq_governor: performance
ucode: 0x52c




If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
-->


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

=
class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/testcase/testtime/ucode:
   
cpu-cache/gcc-9/performance/1HDD/x86_64-rhel-7.6/100%/debian-x86_64-20191114.cgz/lkp-csl-2sp5/stress-ng/30s/0x52c

commit:
   0c4395fb2a ("evm: Fix possible memory leak in evm_calc_hmac_or_hash()")
   8eb613c0b8 ("ima: verify mprotect change is consistent with mmap policy")

0c4395fb2aa77341 8eb613c0b8f19627ba1846dcf78
 ---
fail:runs  %reproductionfail:runs
| | |
:4   25%   1:4 
dmesg.WARNING:at#for_ip_interrupt_entry/0x
   0:43%   0:4 
perf-profile.children.cycles-pp.error_entry
  %stddev %change %stddev
  \  |\
1245570   -84.2% 197151stress-ng.icache.ops
  41517   -84.2%   6570stress-ng.icache.ops_per_sec
  1.306e+09   -82.1%  2.338e+08stress-ng.time.minor_page_faults
   2985   +13.5%   3387stress-ng.time.system_time
   4.28   +13.1%   4.85iostat.cpu.system
   4.18+0.64.73mpstat.cpu.all.sys%
  10121+9.6%  11096 ±  3%  softirqs.CPU67.SCHED
 203299-4.2% 194854 ±  5%  vmstat.system.in
  26.91+2.8%  27.67 ±  3%  boot-time.boot
  16.34+3.7%  16.94 ±  2%  boot-time.dhcp
   2183 ±  3%  +3.7%   2263boot-time.idle
1042938 ± 80%   +8208.2%   86649242 ±156%  cpuidle.C1.time
  48428 ±114%   +1842.4% 940677 ±151%  cpuidle.C1.usage
  15748 ± 28%+301.0%  63144 ± 79%  cpuidle.POLL.usage
  61300 ±  4% +82.8% 112033 ± 11%  numa-vmstat.node1.nr_active_anon
  47060 ±  3%+106.8%  97323 ± 12%  numa-vmstat.node1.nr_anon_pages
  42.67 ±  2%+217.0% 135.25 ± 14%  
numa-vmstat.node1.nr_anon_transparent_hugepages
  61301 ±  4% +82.8% 112032 ± 11%  
numa-vmstat.node1.nr_zone_active_anon
   3816 ±  2%  +3.0%   3931proc-vmstat.nr_page_table_pages
   35216541+2.9%   36244047proc-vmstat.pgalloc_normal
  1.308e+09   -82.0%  2.356e+08proc-vmstat.pgfault
   35173363+2.8%   36173843proc-vmstat.pgfree
 248171 ±  5% +82.5% 452893 ± 11%  numa-meminfo.node1.Active
 244812 ±  4% +83.5% 449116 ± 11%  numa-meminfo.node1.Active(anon)
  88290 ±  3%+214.4% 277591 ± 15%  numa-meminfo.node1.AnonHugePages
 187940 ±  3%+107.8% 390486 ± 12%  numa-meminfo.node1.AnonPages
1366813 ±  3% +12.0%1530428 ±  6%  numa-meminfo.node1.MemUsed
 571.00 ±  8% +10.4% 630.50 ±  8%  slabinfo.UDP.active_objs
 571.00 ±  8% +10.4% 630.50 ±  8%  slabinfo.UDP.num_objs
 300.00 ±  5% +20.0% 360.00 ± 10%  slabinfo.kmem_cache.active_objs
 300.00 ±  5% +20.0% 360.00 ± 10%  slabinfo.kmem_cache.num_objs
 606.33 ±  4% +17.6% 713.00 ±  8%  
slabinfo.kmem_cache_node.active_objs
 661.33 ±  4% +16.1% 768.00 ±  8%  slabinfo.kmem_cache_node.num_objs
 114561 ± 23% -34.3%  75239 ±  7%  sched_debug.cfs_rq:/.load.max
  14869 ± 22% -36.6%   9424 ±  8%  sched_debug.cfs_rq:/.load.stddev
4040842 ±  5% +18.0%4767515 ± 13%  sched_debug.cpu.avg_idle.max
2019061 ±  8% +25.5%2534134 ± 14%  
sched_debug.cpu.max_idle_balance_cost.max
 378044 ±  3% +22.5% 463135 ±  8%  
sched_debug.cpu.max_idle_balance_cost.stddev
  41605   +12.6%  46852 ±  2%  

Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression

2019-09-25 Thread Xing Zhengjun




On 8/30/2019 8:43 AM, Xing Zhengjun wrote:



On 8/7/2019 3:56 PM, Xing Zhengjun wrote:



On 7/24/2019 1:17 PM, Xing Zhengjun wrote:



On 7/12/2019 2:42 PM, Xing Zhengjun wrote:

Hi Trond,

 I attached perf-profile part big changes, hope it is useful for 
analyzing the issue.


Ping...


ping...


ping...


ping...






In testcase: fsmark
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 
3.00GHz with 384G memory

with following parameters:

 iterations: 20x
 nr_threads: 64t
 disk: 1BRD_48G
 fs: xfs
 fs2: nfsv4
 filesize: 4M
 test_size: 80G
 sync_method: fsyncBeforeClose
 cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test 
synchronous write workloads, for example, mail servers workload.

test-url: https://sourceforge.net/projects/fsmark/

commit:
   e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()")
   0472e47660 ("SUNRPC: Convert socket page send code to use 
iov_iter()")


e791f8e9380d945e 0472e476604998c127f3c80d291
 ---
  %stddev %change %stddev
  \  |    \
 527.29   -22.6% 407.96    fsmark.files_per_sec
   1.97 ± 11%  +0.9    2.88 ±  4% 
perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry 

   0.00    +0.9    0.93 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 

   2.11 ± 10%  +0.9    3.05 ±  4% 
perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary 

   5.29 ±  2%  +1.2    6.46 ±  7% 
perf-profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork
   9.61 ±  5%  +3.1   12.70 ±  2% 
perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
   9.27 ±  5%  +3.1   12.40 ±  2% 
perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 

  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.calltrace.cycles-pp.ret_from_fork
  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
   0.00    +3.4    3.41 ±  4% 
perf-profile.calltrace.cycles-pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg 

   0.00    +3.4    3.44 ±  4% 
perf-profile.calltrace.cycles-pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 

   0.00    +3.5    3.54 ±  4% 
perf-profile.calltrace.cycles-pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 

   2.30 ±  5%  +3.7    6.02 ±  3% 
perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread 

   2.30 ±  5%  +3.7    6.02 ±  3% 
perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork 

   1.81 ±  4%  +3.8    5.59 ±  4% 
perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread 

   1.80 ±  3%  +3.8    5.59 ±  3% 
perf-profile.calltrace.cycles-pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work 

   1.73 ±  4%  +3.8    5.54 ±  4% 
perf-profile.calltrace.cycles-pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule 

   1.72 ±  4%  +3.8    5.54 ±  4% 
perf-profile.calltrace.cycles-pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute 

   0.00    +5.4    5.42 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request 

   0.00    +5.5    5.52 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit 

   0.00    +5.5    5.53 ±  4% 
perf-profile.calltrace.cycles-pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit 

   9.61 ±  5%  +3.1   12.70 ±  2% 
perf-profile.children.cycles-pp.worker_thread
   9.27 ±  5%  +3.1   12.40 ±  2% 
perf-profile.children.cycles-pp.process_one_work
   6.19    +3.2    9.40 ±  4% 
perf-profile.children.cycles-pp.memcpy_erms
  34.53 ±  4%  +3.3   37.78 ±  2% 
perf-profile.children.cycles-pp.ret_from_fork
  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.children.cycles-pp.kthread
   0.00    +3.5    3.46 ±  4% 
perf-profile.children.cycles-pp.memcpy_from_page
   0.00    +3.6    3.56 ±  4% 
perf-profile.children.cycles-pp._copy_from_iter_full
   2.47 ±  4%  +3.7    6.18 ±  3% 
perf-profil

Re: [PATCH v3] trace:Add "gfp_t" support in synthetic_events

2019-09-03 Thread Xing Zhengjun

 Hi Steve,

On 8/13/2019 11:04 AM, Steven Rostedt wrote:

On Tue, 13 Aug 2019 09:04:28 +0800
Xing Zhengjun  wrote:


Hi Steve,

 Could you help to review? Thanks.


Thanks for the ping. Yes, I'll take a look at it. I'll be pulling in a
lot of patches that have queued up.

-- Steve


Could you help to review? Thanks.






On 7/13/2019 12:05 AM, Tom Zanussi wrote:

Hi Zhengjun,

On Fri, 2019-07-12 at 09:53 +0800, Zhengjun Xing wrote:

Add "gfp_t" support in synthetic_events, then the "gfp_t" type
parameter in some functions can be traced.

Prints the gfp flags as hex in addition to the human-readable flag
string.  Example output:

whoopsie-630 [000] ...1 78.969452: testevent: bar=b20
(GFP_ATOMIC|__GFP_ZERO)
  rcuc/0-11  [000] ...1 81.097555: testevent: bar=a20 (GFP_ATOMIC)
  rcuc/0-11  [000] ...1 81.583123: testevent: bar=a20 (GFP_ATOMIC)

Signed-off-by: Tom Zanussi 
Signed-off-by: Zhengjun Xing 


Looks good to me, thanks!

Tom
   

---
   kernel/trace/trace_events_hist.c | 19 +++
   1 file changed, 19 insertions(+)

diff --git a/kernel/trace/trace_events_hist.c
b/kernel/trace/trace_events_hist.c
index ca6b0dff60c5..30f0f32aca62 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -13,6 +13,10 @@
   #include 
   #include 
   
+/* for gfp flag names */

+#include 
+#include 
+
   #include "tracing_map.h"
   #include "trace.h"
   #include "trace_dynevent.h"
@@ -752,6 +756,8 @@ static int synth_field_size(char *type)
size = sizeof(unsigned long);
else if (strcmp(type, "pid_t") == 0)
size = sizeof(pid_t);
+   else if (strcmp(type, "gfp_t") == 0)
+   size = sizeof(gfp_t);
else if (synth_field_is_string(type))
size = synth_field_string_size(type);
   
@@ -792,6 +798,8 @@ static const char *synth_field_fmt(char *type)

fmt = "%lu";
else if (strcmp(type, "pid_t") == 0)
fmt = "%d";
+   else if (strcmp(type, "gfp_t") == 0)
+   fmt = "%x";
else if (synth_field_is_string(type))
fmt = "%s";
   
@@ -834,9 +842,20 @@ static enum print_line_t

print_synth_event(struct trace_iterator *iter,
 i == se->n_fields - 1 ? ""
: " ");
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
+   struct trace_print_flags __flags[] = {
+   __def_gfpflag_names, {-1, NULL} };
+
trace_seq_printf(s, print_fmt, se-

fields[i]->name,

 entry->fields[n_u64],
 i == se->n_fields - 1 ? ""
: " ");
+
+   if (strcmp(se->fields[i]->type, "gfp_t") ==
0) {
+   trace_seq_puts(s, " (");
+   trace_print_flags_seq(s, "|",
+ entry-

fields[n_u64],

+ __flags);
+   trace_seq_putc(s, ')');
+   }
n_u64++;
}
}






--
Zhengjun Xing


Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression

2019-08-29 Thread Xing Zhengjun




On 8/7/2019 3:56 PM, Xing Zhengjun wrote:



On 7/24/2019 1:17 PM, Xing Zhengjun wrote:



On 7/12/2019 2:42 PM, Xing Zhengjun wrote:

Hi Trond,

 I attached perf-profile part big changes, hope it is useful for 
analyzing the issue.


Ping...


ping...


ping...





In testcase: fsmark
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 
with 384G memory

with following parameters:

 iterations: 20x
 nr_threads: 64t
 disk: 1BRD_48G
 fs: xfs
 fs2: nfsv4
 filesize: 4M
 test_size: 80G
 sync_method: fsyncBeforeClose
 cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test 
synchronous write workloads, for example, mail servers workload.

test-url: https://sourceforge.net/projects/fsmark/

commit:
   e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()")
   0472e47660 ("SUNRPC: Convert socket page send code to use 
iov_iter()")


e791f8e9380d945e 0472e476604998c127f3c80d291
 ---
  %stddev %change %stddev
  \  |    \
 527.29   -22.6% 407.96    fsmark.files_per_sec
   1.97 ± 11%  +0.9    2.88 ±  4% 
perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry 

   0.00    +0.9    0.93 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 

   2.11 ± 10%  +0.9    3.05 ±  4% 
perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary 

   5.29 ±  2%  +1.2    6.46 ±  7% 
perf-profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork
   9.61 ±  5%  +3.1   12.70 ±  2% 
perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
   9.27 ±  5%  +3.1   12.40 ±  2% 
perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 

  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.calltrace.cycles-pp.ret_from_fork
  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
   0.00    +3.4    3.41 ±  4% 
perf-profile.calltrace.cycles-pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg 

   0.00    +3.4    3.44 ±  4% 
perf-profile.calltrace.cycles-pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 

   0.00    +3.5    3.54 ±  4% 
perf-profile.calltrace.cycles-pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 

   2.30 ±  5%  +3.7    6.02 ±  3% 
perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread 

   2.30 ±  5%  +3.7    6.02 ±  3% 
perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork 

   1.81 ±  4%  +3.8    5.59 ±  4% 
perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread 

   1.80 ±  3%  +3.8    5.59 ±  3% 
perf-profile.calltrace.cycles-pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work 

   1.73 ±  4%  +3.8    5.54 ±  4% 
perf-profile.calltrace.cycles-pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule 

   1.72 ±  4%  +3.8    5.54 ±  4% 
perf-profile.calltrace.cycles-pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute 

   0.00    +5.4    5.42 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request 

   0.00    +5.5    5.52 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit 

   0.00    +5.5    5.53 ±  4% 
perf-profile.calltrace.cycles-pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit 

   9.61 ±  5%  +3.1   12.70 ±  2% 
perf-profile.children.cycles-pp.worker_thread
   9.27 ±  5%  +3.1   12.40 ±  2% 
perf-profile.children.cycles-pp.process_one_work
   6.19    +3.2    9.40 ±  4% 
perf-profile.children.cycles-pp.memcpy_erms
  34.53 ±  4%  +3.3   37.78 ±  2% 
perf-profile.children.cycles-pp.ret_from_fork
  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.children.cycles-pp.kthread
   0.00    +3.5    3.46 ±  4% 
perf-profile.children.cycles-pp.memcpy_from_page
   0.00    +3.6    3.56 ±  4% 
perf-profile.children.cycles-pp._copy_from_iter_full
   2.47 ±  4%  +3.7    6.18 ±  3% 
perf-profile.children.cycles-pp.__rpc_execute
   2.30 ±  5%  +3.7

Re: [PATCH v3] trace:Add "gfp_t" support in synthetic_events

2019-08-12 Thread Xing Zhengjun

Hi Steve,

   Could you help to review? Thanks.

On 7/13/2019 12:05 AM, Tom Zanussi wrote:

Hi Zhengjun,

On Fri, 2019-07-12 at 09:53 +0800, Zhengjun Xing wrote:

Add "gfp_t" support in synthetic_events, then the "gfp_t" type
parameter in some functions can be traced.

Prints the gfp flags as hex in addition to the human-readable flag
string.  Example output:

   whoopsie-630 [000] ...1 78.969452: testevent: bar=b20
(GFP_ATOMIC|__GFP_ZERO)
 rcuc/0-11  [000] ...1 81.097555: testevent: bar=a20 (GFP_ATOMIC)
 rcuc/0-11  [000] ...1 81.583123: testevent: bar=a20 (GFP_ATOMIC)

Signed-off-by: Tom Zanussi 
Signed-off-by: Zhengjun Xing 


Looks good to me, thanks!

Tom


---
  kernel/trace/trace_events_hist.c | 19 +++
  1 file changed, 19 insertions(+)

diff --git a/kernel/trace/trace_events_hist.c
b/kernel/trace/trace_events_hist.c
index ca6b0dff60c5..30f0f32aca62 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -13,6 +13,10 @@
  #include 
  #include 
  
+/* for gfp flag names */

+#include 
+#include 
+
  #include "tracing_map.h"
  #include "trace.h"
  #include "trace_dynevent.h"
@@ -752,6 +756,8 @@ static int synth_field_size(char *type)
size = sizeof(unsigned long);
else if (strcmp(type, "pid_t") == 0)
size = sizeof(pid_t);
+   else if (strcmp(type, "gfp_t") == 0)
+   size = sizeof(gfp_t);
else if (synth_field_is_string(type))
size = synth_field_string_size(type);
  
@@ -792,6 +798,8 @@ static const char *synth_field_fmt(char *type)

fmt = "%lu";
else if (strcmp(type, "pid_t") == 0)
fmt = "%d";
+   else if (strcmp(type, "gfp_t") == 0)
+   fmt = "%x";
else if (synth_field_is_string(type))
fmt = "%s";
  
@@ -834,9 +842,20 @@ static enum print_line_t

print_synth_event(struct trace_iterator *iter,
 i == se->n_fields - 1 ? ""
: " ");
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
+   struct trace_print_flags __flags[] = {
+   __def_gfpflag_names, {-1, NULL} };
+
trace_seq_printf(s, print_fmt, se-

fields[i]->name,

 entry->fields[n_u64],
 i == se->n_fields - 1 ? ""
: " ");
+
+   if (strcmp(se->fields[i]->type, "gfp_t") ==
0) {
+   trace_seq_puts(s, " (");
+   trace_print_flags_seq(s, "|",
+ entry-

fields[n_u64],

+ __flags);
+   trace_seq_putc(s, ')');
+   }
n_u64++;
}
}


--
Zhengjun Xing


Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression

2019-08-07 Thread Xing Zhengjun




On 7/24/2019 1:17 PM, Xing Zhengjun wrote:



On 7/12/2019 2:42 PM, Xing Zhengjun wrote:

Hi Trond,

 I attached perf-profile part big changes, hope it is useful for 
analyzing the issue.


Ping...


ping...






In testcase: fsmark
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 
with 384G memory

with following parameters:

 iterations: 20x
 nr_threads: 64t
 disk: 1BRD_48G
 fs: xfs
 fs2: nfsv4
 filesize: 4M
 test_size: 80G
 sync_method: fsyncBeforeClose
 cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test 
synchronous write workloads, for example, mail servers workload.

test-url: https://sourceforge.net/projects/fsmark/

commit:
   e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()")
   0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()")

e791f8e9380d945e 0472e476604998c127f3c80d291
 ---
  %stddev %change %stddev
  \  |    \
 527.29   -22.6% 407.96    fsmark.files_per_sec
   1.97 ± 11%  +0.9    2.88 ±  4% 
perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry 

   0.00    +0.9    0.93 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 

   2.11 ± 10%  +0.9    3.05 ±  4% 
perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary 

   5.29 ±  2%  +1.2    6.46 ±  7% 
perf-profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork
   9.61 ±  5%  +3.1   12.70 ±  2% 
perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
   9.27 ±  5%  +3.1   12.40 ±  2% 
perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 

  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.calltrace.cycles-pp.ret_from_fork
  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
   0.00    +3.4    3.41 ±  4% 
perf-profile.calltrace.cycles-pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg 

   0.00    +3.4    3.44 ±  4% 
perf-profile.calltrace.cycles-pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 

   0.00    +3.5    3.54 ±  4% 
perf-profile.calltrace.cycles-pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 

   2.30 ±  5%  +3.7    6.02 ±  3% 
perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread 

   2.30 ±  5%  +3.7    6.02 ±  3% 
perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork 

   1.81 ±  4%  +3.8    5.59 ±  4% 
perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread 

   1.80 ±  3%  +3.8    5.59 ±  3% 
perf-profile.calltrace.cycles-pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work 

   1.73 ±  4%  +3.8    5.54 ±  4% 
perf-profile.calltrace.cycles-pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule 

   1.72 ±  4%  +3.8    5.54 ±  4% 
perf-profile.calltrace.cycles-pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute 

   0.00    +5.4    5.42 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request 

   0.00    +5.5    5.52 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit 

   0.00    +5.5    5.53 ±  4% 
perf-profile.calltrace.cycles-pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit 

   9.61 ±  5%  +3.1   12.70 ±  2% 
perf-profile.children.cycles-pp.worker_thread
   9.27 ±  5%  +3.1   12.40 ±  2% 
perf-profile.children.cycles-pp.process_one_work
   6.19    +3.2    9.40 ±  4% 
perf-profile.children.cycles-pp.memcpy_erms
  34.53 ±  4%  +3.3   37.78 ±  2% 
perf-profile.children.cycles-pp.ret_from_fork
  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.children.cycles-pp.kthread
   0.00    +3.5    3.46 ±  4% 
perf-profile.children.cycles-pp.memcpy_from_page
   0.00    +3.6    3.56 ±  4% 
perf-profile.children.cycles-pp._copy_from_iter_full
   2.47 ±  4%  +3.7    6.18 ±  3% 
perf-profile.children.cycles-pp.__rpc_execute
   2.30 ±  5%  +3.7    6.02 ±  3% 
perf-profile.children.cycles-p

Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression

2019-07-23 Thread Xing Zhengjun




On 7/12/2019 2:42 PM, Xing Zhengjun wrote:

Hi Trond,

     I attached perf-profile part big changes, hope it is useful for 
analyzing the issue.


Ping...




In testcase: fsmark
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 
with 384G memory

with following parameters:

     iterations: 20x
     nr_threads: 64t
     disk: 1BRD_48G
     fs: xfs
     fs2: nfsv4
     filesize: 4M
     test_size: 80G
     sync_method: fsyncBeforeClose
     cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test 
synchronous write workloads, for example, mail servers workload.

test-url: https://sourceforge.net/projects/fsmark/

commit:
   e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()")
   0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()")

e791f8e9380d945e 0472e476604998c127f3c80d291
 ---
  %stddev %change %stddev
  \  |    \
     527.29   -22.6% 407.96    fsmark.files_per_sec
   1.97 ± 11%  +0.9    2.88 ±  4% 
perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry 

   0.00    +0.9    0.93 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 

   2.11 ± 10%  +0.9    3.05 ±  4% 
perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary 

   5.29 ±  2%  +1.2    6.46 ±  7% 
perf-profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork
   9.61 ±  5%  +3.1   12.70 ±  2% 
perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
   9.27 ±  5%  +3.1   12.40 ±  2% 
perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 

  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.calltrace.cycles-pp.ret_from_fork
  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
   0.00    +3.4    3.41 ±  4% 
perf-profile.calltrace.cycles-pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg 

   0.00    +3.4    3.44 ±  4% 
perf-profile.calltrace.cycles-pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 

   0.00    +3.5    3.54 ±  4% 
perf-profile.calltrace.cycles-pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 

   2.30 ±  5%  +3.7    6.02 ±  3% 
perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread 

   2.30 ±  5%  +3.7    6.02 ±  3% 
perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork 

   1.81 ±  4%  +3.8    5.59 ±  4% 
perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread 

   1.80 ±  3%  +3.8    5.59 ±  3% 
perf-profile.calltrace.cycles-pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work 

   1.73 ±  4%  +3.8    5.54 ±  4% 
perf-profile.calltrace.cycles-pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule 

   1.72 ±  4%  +3.8    5.54 ±  4% 
perf-profile.calltrace.cycles-pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute 

   0.00    +5.4    5.42 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request 

   0.00    +5.5    5.52 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit 

   0.00    +5.5    5.53 ±  4% 
perf-profile.calltrace.cycles-pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit 

   9.61 ±  5%  +3.1   12.70 ±  2% 
perf-profile.children.cycles-pp.worker_thread
   9.27 ±  5%  +3.1   12.40 ±  2% 
perf-profile.children.cycles-pp.process_one_work
   6.19    +3.2    9.40 ±  4% 
perf-profile.children.cycles-pp.memcpy_erms
  34.53 ±  4%  +3.3   37.78 ±  2% 
perf-profile.children.cycles-pp.ret_from_fork
  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.children.cycles-pp.kthread
   0.00    +3.5    3.46 ±  4% 
perf-profile.children.cycles-pp.memcpy_from_page
   0.00    +3.6    3.56 ±  4% 
perf-profile.children.cycles-pp._copy_from_iter_full
   2.47 ±  4%  +3.7    6.18 ±  3% 
perf-profile.children.cycles-pp.__rpc_execute
   2.30 ±  5%  +3.7    6.02 ±  3% 
perf-profile.children.cycles-pp.rpc_async_schedule
   1.90 ±  4%  +3.8

Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression

2019-07-12 Thread Xing Zhengjun
smit
  1.82 ±  4%  +3.85.62 ±  3% 
perf-profile.children.cycles-pp.xs_tcp_send_request
  1.81 ±  4%  +3.85.62 ±  3% 
perf-profile.children.cycles-pp.xs_sendpages
  0.21 ± 17%  +5.35.48 ±  4% 
perf-profile.children.cycles-pp.tcp_sendmsg_locked
  0.25 ± 18%  +5.35.59 ±  3% 
perf-profile.children.cycles-pp.tcp_sendmsg
  0.26 ± 16%  +5.35.60 ±  3% 
perf-profile.children.cycles-pp.sock_sendmsg
  1.19 ±  5%  +0.51.68 ±  3% 
perf-profile.self.cycles-pp.get_page_from_freelist
  6.10+3.29.27 ±  4% 
perf-profile.self.cycles-pp.memcpy_erms



On 7/9/2019 10:39 AM, Xing Zhengjun wrote:

Hi Trond,

On 7/8/2019 7:44 PM, Trond Myklebust wrote:
I've asked several times now about how to interpret your results. As 
far as I can tell from your numbers, the overhead appears to be 
entirely contained in the NUMA section of your results.
IOW: it would appear to be a scheduling overhead due to NUMA. I've 
been asking whether or not that is a correct interpretation of the 
numbers you published.
Thanks for your feedback. I used the same hardware and the same test 
parameters to test the two commits:

    e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()")
    0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()")

If it is caused by NUMA, why only commit 0472e47660 throughput is 
decreased? The filesystem we test is NFS, commit 0472e47660 is related 
with the network, could you help to check if have any other clues for 
the regression. Thanks.




--
Zhengjun Xing


Re: [PATCH v2] tracing: Add verbose gfp_flag printing to synthetic events

2019-07-11 Thread Xing Zhengjun

Hi Tom,

On 7/11/2019 11:42 PM, Tom Zanussi wrote:

Hi Zhengjun,

The patch itself looks fine to me, but could you please create a v3
with a couple changes to the commit message?  I noticed you dropped
your original commit message - please add it back and combine with part
of mine, as below.  Also, please keep your original Subject line
('[PATCH] trace:add "gfp_t" support in synthetic_events') (but the
first word after trace:, 'add', should be capitalized.)


Thanks. I will send v3 version patch soon.


On Thu, 2019-07-11 at 16:46 +0800, Zhengjun Xing wrote:

Add on top of 'trace:add "gfp_t" support in synthetic_events'.


Please remove this part but keep the part below.



Prints the gfp flags as hex in addition to the human-readable flag
string.  Example output:

   whoopsie-630 [000] ...1 78.969452: testevent: bar=b20
(GFP_ATOMIC|__GFP_ZERO)
 rcuc/0-11  [000] ...1 81.097555: testevent: bar=a20 (GFP_ATOMIC)
 rcuc/0-11  [000] ...1 81.583123: testevent: bar=a20 (GFP_ATOMIC)



So basically, something like this:

[PATCH] trace: Add "gfp_t" support in synthetic_events

Add "gfp_t" support in synthetic_events, then the "gfp_t" type
parameter in some functions can be traced.

Print the gfp flags as hex in addition to the human-readable flag
string.  Example output:

   whoopsie-630 [000] ...1 78.969452: testevent: bar=b20 (GFP_ATOMIC|__GFP_ZERO)
 rcuc/0-11  [000] ...1 81.097555: testevent: bar=a20 (GFP_ATOMIC)
 rcuc/0-11  [000] ...1 81.583123: testevent: bar=a20 (GFP_ATOMIC)


Signed-off-by: Tom Zanussi 
Signed-off-by: Zhengjun Xing 



Thanks,

Tom


---
  kernel/trace/trace_events_hist.c | 19 +++
  1 file changed, 19 insertions(+)

diff --git a/kernel/trace/trace_events_hist.c
b/kernel/trace/trace_events_hist.c
index ca6b0dff60c5..938ef3f54c5c 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -13,6 +13,10 @@
  #include 
  #include 
  
+/* for gfp flag names */

+#include 
+#include 
+
  #include "tracing_map.h"
  #include "trace.h"
  #include "trace_dynevent.h"
@@ -752,6 +756,8 @@ static int synth_field_size(char *type)
size = sizeof(unsigned long);
else if (strcmp(type, "pid_t") == 0)
size = sizeof(pid_t);
+   else if (strcmp(type, "gfp_t") == 0)
+   size = sizeof(gfp_t);
else if (synth_field_is_string(type))
size = synth_field_string_size(type);
  
@@ -792,6 +798,8 @@ static const char *synth_field_fmt(char *type)

fmt = "%lu";
else if (strcmp(type, "pid_t") == 0)
fmt = "%d";
+   else if (strcmp(type, "gfp_t") == 0)
+   fmt = "%x";
else if (synth_field_is_string(type))
fmt = "%s";
  
@@ -834,9 +838,20 @@ static enum print_line_t

print_synth_event(struct trace_iterator *iter,
 i == se->n_fields - 1 ? ""
: " ");
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
+   struct trace_print_flags __flags[] = {
+   __def_gfpflag_names, {-1, NULL} };
+
trace_seq_printf(s, print_fmt, se-

fields[i]->name,

 entry->fields[n_u64],
 i == se->n_fields - 1 ? ""
: " ");
+
+   if (strcmp(se->fields[i]->type, "gfp_t") ==
0) {
+   trace_seq_puts(s, " (");
+   trace_print_flags_seq(s, "|",
+ entry-

fields[n_u64],

+ __flags);
+   trace_seq_putc(s, ')');
+   }
n_u64++;
}
}


--
Zhengjun Xing


Re: [PATCH] trace:add "gfp_t" support in synthetic_events

2019-07-11 Thread Xing Zhengjun

Hi Tom,

On 7/11/2019 3:51 AM, Tom Zanussi wrote:

Hi Zhengjun,

On Thu, 2019-07-04 at 10:55 +0800, Zhengjun Xing wrote:

Add "gfp_t" support in synthetic_events, then the "gfp_t" type
parameter in some functions can be traced.

Signed-off-by: Zhengjun Xing 
---
  kernel/trace/trace_events_hist.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/kernel/trace/trace_events_hist.c
b/kernel/trace/trace_events_hist.c
index ca6b0dff60c5..0d3ab01b7cb5 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -752,6 +752,8 @@ static int synth_field_size(char *type)
size = sizeof(unsigned long);
else if (strcmp(type, "pid_t") == 0)
size = sizeof(pid_t);
+   else if (strcmp(type, "gfp_t") == 0)
+   size = sizeof(gfp_t);
else if (synth_field_is_string(type))
size = synth_field_string_size(type);
  
@@ -792,6 +794,8 @@ static const char *synth_field_fmt(char *type)

fmt = "%lu";
else if (strcmp(type, "pid_t") == 0)
fmt = "%d";
+   else if (strcmp(type, "gfp_t") == 0)
+   fmt = "%u";
else if (synth_field_is_string(type))
fmt = "%s";
  


This will work, but I think it would be better to display as hex, and
also show the flags in human-readable form.

How about adding something like this on top of your patch?:


Thanks, I will add it to the v2 version patch.


[PATCH] tracing: Add verbose gfp_flag printing to synthetic events

Add on top of 'trace:add "gfp_t" support in synthetic_events'.

Prints the gfp flags as hex in addition to the human-readable flag
string.  Example output:

   whoopsie-630 [000] ...1 78.969452: testevent: bar=b20 (GFP_ATOMIC|__GFP_ZERO)
 rcuc/0-11  [000] ...1 81.097555: testevent: bar=a20 (GFP_ATOMIC)
 rcuc/0-11  [000] ...1 81.583123: testevent: bar=a20 (GFP_ATOMIC)

Signed-off-by: Tom Zanussi 
---
  kernel/trace/trace_events_hist.c | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0d3ab01..aeb4449 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -13,6 +13,10 @@
  #include 
  #include 
  
+/* for gfp flag names */

+#include 
+#include 
+
  #include "tracing_map.h"
  #include "trace.h"
  #include "trace_dynevent.h"
@@ -795,7 +799,7 @@ static const char *synth_field_fmt(char *type)
else if (strcmp(type, "pid_t") == 0)
fmt = "%d";
else if (strcmp(type, "gfp_t") == 0)
-   fmt = "%u";
+   fmt = "%x";
else if (synth_field_is_string(type))
fmt = "%s";
  
@@ -838,9 +842,20 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,

 i == se->n_fields - 1 ? "" : " ");
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
+   struct trace_print_flags __flags[] =
+   { __def_gfpflag_names, { -1, NULL }};
+
trace_seq_printf(s, print_fmt, se->fields[i]->name,
 entry->fields[n_u64],
 i == se->n_fields - 1 ? "" : " ");
+
+   if (strcmp(se->fields[i]->type, "gfp_t") == 0) {
+   trace_seq_puts(s, " (");
+   trace_print_flags_seq(s, "|",
+ entry->fields[n_u64],
+ __flags);
+   trace_seq_putc(s, ')');
+   }
n_u64++;
}
}



--
Zhengjun Xing


Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression

2019-07-08 Thread Xing Zhengjun

Hi Trond,

On 7/8/2019 7:44 PM, Trond Myklebust wrote:
I've asked several times now about how to interpret your results. As far 
as I can tell from your numbers, the overhead appears to be entirely 
contained in the NUMA section of your results.
IOW: it would appear to be a scheduling overhead due to NUMA. I've been 
asking whether or not that is a correct interpretation of the numbers 
you published.
Thanks for your feedback. I used the same hardware and the same test 
parameters to test the two commits:

   e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()")
   0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()")

If it is caused by NUMA, why only commit 0472e47660 throughput is 
decreased? The filesystem we test is NFS, commit 0472e47660 is related 
with the network, could you help to check if have any other clues for 
the regression. Thanks.


--
Zhengjun Xing


Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression

2019-07-08 Thread Xing Zhengjun

Hi Trond,

   I retest, it still can be reproduced. I test with the following 
parameters, only change "nr_threads", the test results are as the 
following. From the test results, more threads in the test, more 
regression will happen. Could you help to check? Thanks.



In testcase: fsmark
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 
with 384G memory

with following parameters:

iterations: 20x
nr_threads: 1t
disk: 1BRD_48G
fs: xfs
fs2: nfsv4
filesize: 4M
test_size: 80G
sync_method: fsyncBeforeClose
cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test 
synchronous write workloads, for example, mail servers workload.

test-url: https://sourceforge.net/projects/fsmark/

commit:
  e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()")
  0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()")

e791f8e9380d945e 0472e476604998c127f3c80d291
 ---
 %stddev %change %stddev
 \  |\
 59.74-0.7%  59.32fsmark.files_per_sec 
(nr_threads= 1)
114.06-8.1% 104.83fsmark.files_per_sec 
(nr_threads= 2)
184.53   -13.1% 160.29fsmark.files_per_sec 
(nr_threads= 4)
257.05   -15.5% 217.22fsmark.files_per_sec 
(nr_threads= 8)
306.08   -15.5% 258.68fsmark.files_per_sec 
(nr_threads=16)
498.34   -22.7% 385.33fsmark.files_per_sec 
(nr_threads=32)
527.29   -22.6% 407.96fsmark.files_per_sec 
(nr_threads=64)




On 5/31/2019 11:27 AM, Xing Zhengjun wrote:



On 5/31/2019 3:10 AM, Trond Myklebust wrote:

On Thu, 2019-05-30 at 15:20 +0800, Xing Zhengjun wrote:


On 5/30/2019 10:00 AM, Trond Myklebust wrote:

Hi Xing,

On Thu, 2019-05-30 at 09:35 +0800, Xing Zhengjun wrote:

Hi Trond,

On 5/20/2019 1:54 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a 16.0% improvement of fsmark.app_overhead due
to
commit:


commit: 0472e476604998c127f3c80d291113e77c5676ac ("SUNRPC:
Convert
socket page send code to use iov_iter()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
master

in testcase: fsmark
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @
3.00GHz with 384G memory
with following parameters:

iterations: 1x
nr_threads: 64t
disk: 1BRD_48G
fs: xfs
fs2: nfsv4
filesize: 4M
test_size: 40G
sync_method: fsyncBeforeClose
cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test
synchronous write workloads, for example, mail servers
workload.
test-url: https://sourceforge.net/projects/fsmark/



Details are as below:
-

->


To reproduce:

   git clone https://github.com/intel/lkp-tests.git
   cd lkp-tests
   bin/lkp install job.yaml  # job file is attached in
this
email
   bin/lkp run job.yaml

===

==
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconf
ig/n
r_threads/rootfs/sync_method/tbox_group/test_size/testcase:
 gcc-7/performance/1BRD_48G/4M/nfsv4/xfs/1x/x86_64-rhel-
7.6/64t/debian-x86_64-2018-04-03.cgz/fsyncBeforeClose/lkp-ivb-
ep01/40G/fsmark

commit:
 e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use
iov_iter_kvec()")
 0472e47660 ("SUNRPC: Convert socket page send code to use
iov_iter()")

e791f8e9380d945e 0472e476604998c127f3c80d291
 ---
  fail:runs  %reproduction    fail:runs
  | | |
  :4   50%   2:4 dmesg.WARNING:a
t#for
_ip_interrupt_entry/0x
    %stddev %change %stddev
    \  |    \
 15118573
±  2% +16.0%   17538083    fsmark.app_overhead
   510.93   -
22.7% 395.12    fsmark.files_per_sec
    24.90   +22.8%  30.57    fsmark.time.ela
psed_
time
    24.90   +22.8%  30.57    fsmark.time.ela
psed_
time.max
   288.00 ±  2% -
27.8% 208.00    fsmark.time.percent_of_cpu_this_job_got
    70.03 ±  2% -
11.3%  62.14    fsmark.time.system_time



Do you have time to take a look at this regression?


  From your stats, it looks to me as if the problem is increased
NUMA
overhead. Pretty much everything else appears to be the same or
actually performing better than previously. Am I interpreting that
correctly?

The real regression is the throughput(fsmark.files_per_sec) is
decreased
by 22.7%.


Understood, but I'm trying to make sense of

Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression

2019-05-30 Thread Xing Zhengjun




On 5/31/2019 3:10 AM, Trond Myklebust wrote:

On Thu, 2019-05-30 at 15:20 +0800, Xing Zhengjun wrote:


On 5/30/2019 10:00 AM, Trond Myklebust wrote:

Hi Xing,

On Thu, 2019-05-30 at 09:35 +0800, Xing Zhengjun wrote:

Hi Trond,

On 5/20/2019 1:54 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a 16.0% improvement of fsmark.app_overhead due
to
commit:


commit: 0472e476604998c127f3c80d291113e77c5676ac ("SUNRPC:
Convert
socket page send code to use iov_iter()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
master

in testcase: fsmark
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @
3.00GHz with 384G memory
with following parameters:

iterations: 1x
nr_threads: 64t
disk: 1BRD_48G
fs: xfs
fs2: nfsv4
filesize: 4M
test_size: 40G
sync_method: fsyncBeforeClose
cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test
synchronous write workloads, for example, mail servers
workload.
test-url: https://sourceforge.net/projects/fsmark/



Details are as below:
-

->


To reproduce:

   git clone https://github.com/intel/lkp-tests.git
   cd lkp-tests
   bin/lkp install job.yaml  # job file is attached in
this
email
   bin/lkp run job.yaml

===

==
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconf
ig/n
r_threads/rootfs/sync_method/tbox_group/test_size/testcase:
 gcc-7/performance/1BRD_48G/4M/nfsv4/xfs/1x/x86_64-rhel-
7.6/64t/debian-x86_64-2018-04-03.cgz/fsyncBeforeClose/lkp-ivb-
ep01/40G/fsmark

commit:
 e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use
iov_iter_kvec()")
 0472e47660 ("SUNRPC: Convert socket page send code to use
iov_iter()")

e791f8e9380d945e 0472e476604998c127f3c80d291
 ---
  fail:runs  %reproductionfail:runs
  | | |
  :4   50%   2:4 dmesg.WARNING:a
t#for
_ip_interrupt_entry/0x
%stddev %change %stddev
\  |\
 15118573
±  2% +16.0%   17538083fsmark.app_overhead
   510.93   -
22.7% 395.12fsmark.files_per_sec
24.90   +22.8%  30.57fsmark.time.ela
psed_
time
24.90   +22.8%  30.57fsmark.time.ela
psed_
time.max
   288.00 ±  2% -
27.8% 208.00fsmark.time.percent_of_cpu_this_job_got
70.03 ±  2% -
11.3%  62.14fsmark.time.system_time



Do you have time to take a look at this regression?


  From your stats, it looks to me as if the problem is increased
NUMA
overhead. Pretty much everything else appears to be the same or
actually performing better than previously. Am I interpreting that
correctly?

The real regression is the throughput(fsmark.files_per_sec) is
decreased
by 22.7%.


Understood, but I'm trying to make sense of why. I'm not able to
reproduce this, so I have to rely on your performance stats to
understand where the 22.7% regression is coming from. As far as I can
see, the only numbers in the stats you published that are showing a
performance regression (other than the fsmark number itself), are the
NUMA numbers. Is that a correct interpretation?


We re-test the case yesterday, the test result almost is the same.
we will do more test and also check the test case itself, if you need
more information, please let me know, thanks.


If my interpretation above is correct, then I'm not seeing where
this
patch would be introducing new NUMA regressions. It is just
converting
from using one method of doing socket I/O to another. Could it
perhaps
be a memory artefact due to your running the NFS client and server
on
the same machine?

Apologies for pushing back a little, but I just don't have the
hardware available to test NUMA configurations, so I'm relying on
external testing for the above kind of scenario.


Thanks for looking at this.  If you need more information, please let
me
know.

Thanks
Trond



--
Zhengjun Xing


Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression

2019-05-30 Thread Xing Zhengjun




On 5/30/2019 10:00 AM, Trond Myklebust wrote:

Hi Xing,

On Thu, 2019-05-30 at 09:35 +0800, Xing Zhengjun wrote:

Hi Trond,

On 5/20/2019 1:54 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a 16.0% improvement of fsmark.app_overhead due to
commit:


commit: 0472e476604998c127f3c80d291113e77c5676ac ("SUNRPC: Convert
socket page send code to use iov_iter()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
master

in testcase: fsmark
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @
3.00GHz with 384G memory
with following parameters:

iterations: 1x
nr_threads: 64t
disk: 1BRD_48G
fs: xfs
fs2: nfsv4
filesize: 4M
test_size: 40G
sync_method: fsyncBeforeClose
cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test
synchronous write workloads, for example, mail servers workload.
test-url: https://sourceforge.net/projects/fsmark/



Details are as below:
-
->


To reproduce:

  git clone https://github.com/intel/lkp-tests.git
  cd lkp-tests
  bin/lkp install job.yaml  # job file is attached in this
email
  bin/lkp run job.yaml

===
==
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/n
r_threads/rootfs/sync_method/tbox_group/test_size/testcase:
gcc-7/performance/1BRD_48G/4M/nfsv4/xfs/1x/x86_64-rhel-
7.6/64t/debian-x86_64-2018-04-03.cgz/fsyncBeforeClose/lkp-ivb-
ep01/40G/fsmark

commit:
e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use
iov_iter_kvec()")
0472e47660 ("SUNRPC: Convert socket page send code to use
iov_iter()")

e791f8e9380d945e 0472e476604998c127f3c80d291
 ---
 fail:runs  %reproductionfail:runs
 | | |
 :4   50%   2:4 dmesg.WARNING:at#for
_ip_interrupt_entry/0x
   %stddev %change %stddev
   \  |\
15118573 ±  2% +16.0%   17538083fsmark.app_overhead
  510.93   -22.7% 395.12fsmark.files_per_sec
   24.90   +22.8%  30.57fsmark.time.elapsed_
time
   24.90   +22.8%  30.57fsmark.time.elapsed_
time.max
  288.00 ±  2% -
27.8% 208.00fsmark.time.percent_of_cpu_this_job_got
   70.03 ±  2% -
11.3%  62.14fsmark.time.system_time



Do you have time to take a look at this regression?


 From your stats, it looks to me as if the problem is increased NUMA
overhead. Pretty much everything else appears to be the same or
actually performing better than previously. Am I interpreting that
correctly?
The real regression is the throughput(fsmark.files_per_sec) is decreased 
by 22.7%.


If my interpretation above is correct, then I'm not seeing where this
patch would be introducing new NUMA regressions. It is just converting
from using one method of doing socket I/O to another. Could it perhaps
be a memory artefact due to your running the NFS client and server on
the same machine?

Apologies for pushing back a little, but I just don't have the
hardware available to test NUMA configurations, so I'm relying on
external testing for the above kind of scenario.


Thanks for looking at this.  If you need more information, please let me
know.

Thanks
   Trond



--
Zhengjun Xing


Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression

2019-05-29 Thread Xing Zhengjun

Hi Trond,

On 5/20/2019 1:54 PM, kernel test robot wrote:

Greeting,

FYI, we noticed a 16.0% improvement of fsmark.app_overhead due to commit:


commit: 0472e476604998c127f3c80d291113e77c5676ac ("SUNRPC: Convert socket page send 
code to use iov_iter()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: fsmark
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G 
memory
with following parameters:

iterations: 1x
nr_threads: 64t
disk: 1BRD_48G
fs: xfs
fs2: nfsv4
filesize: 4M
test_size: 40G
sync_method: fsyncBeforeClose
cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test synchronous 
write workloads, for example, mail servers workload.
test-url: https://sourceforge.net/projects/fsmark/



Details are as below:
-->


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

=
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
   
gcc-7/performance/1BRD_48G/4M/nfsv4/xfs/1x/x86_64-rhel-7.6/64t/debian-x86_64-2018-04-03.cgz/fsyncBeforeClose/lkp-ivb-ep01/40G/fsmark

commit:
   e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()")
   0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()")

e791f8e9380d945e 0472e476604998c127f3c80d291
 ---
fail:runs  %reproductionfail:runs
| | |
:4   50%   2:4 
dmesg.WARNING:at#for_ip_interrupt_entry/0x
  %stddev %change %stddev
  \  |\
   15118573 ±  2% +16.0%   17538083fsmark.app_overhead
 510.93   -22.7% 395.12fsmark.files_per_sec
  24.90   +22.8%  30.57fsmark.time.elapsed_time
  24.90   +22.8%  30.57fsmark.time.elapsed_time.max
 288.00 ±  2% -27.8% 208.00
fsmark.time.percent_of_cpu_this_job_got
  70.03 ±  2% -11.3%  62.14fsmark.time.system_time
4391964   -16.7%3658341meminfo.max_used_kB
   6.10 ±  4%  +1.97.97 ±  3%  mpstat.cpu.all.iowait%
   0.27-0.00.24 ±  3%  mpstat.cpu.all.soft%
   13668070 ± 40%+118.0%   29801846 ± 19%  numa-numastat.node0.local_node
   1364 ± 40%+117.9%   29810258 ± 19%  numa-numastat.node0.numa_hit
   5.70 ±  3% +32.1%   7.53 ±  3%  iostat.cpu.iowait
  16.42 ±  2%  -5.8%  15.47iostat.cpu.system
   2.57-4.1%   2.46iostat.cpu.user
1406781 ±  2% -15.5%1188498vmstat.io.bo
 251792 ±  3% -16.6% 209928vmstat.system.cs
  84841-1.9%  83239vmstat.system.in
   97374502 ± 20% +66.1%  1.617e+08 ± 17%  cpuidle.C1E.time
 573934 ± 19% +44.6% 829662 ± 26%  cpuidle.C1E.usage
  5.892e+08 ±  8% +15.3%  6.796e+08 ±  2%  cpuidle.C6.time
1968016 ±  3% -15.1%1670867 ±  3%  cpuidle.POLL.time
 106420 ± 47% +86.2% 198108 ± 35%  numa-meminfo.node0.Active
 106037 ± 48% +86.2% 197395 ± 35%  numa-meminfo.node0.Active(anon)
 105052 ± 48% +86.6% 196037 ± 35%  numa-meminfo.node0.AnonPages
 212876 ± 24% -41.5% 124572 ± 56%  numa-meminfo.node1.Active
 211801 ± 24% -41.5% 123822 ± 56%  numa-meminfo.node1.Active(anon)
 208559 ± 24% -42.2% 120547 ± 57%  numa-meminfo.node1.AnonPages
   9955+1.6%  10116proc-vmstat.nr_kernel_stack
 452.25 ± 59%+280.9%   1722 ±100%  
proc-vmstat.numa_hint_faults_local
   33817303   +55.0%   52421773 ±  5%  proc-vmstat.numa_hit
   33804286   +55.0%   52408807 ±  5%  proc-vmstat.numa_local
   33923002   +81.8%   61663426 ±  5%  proc-vmstat.pgalloc_normal
 184765+9.3% 201985proc-vmstat.pgfault
   12840986  +216.0%   40581327 ±  7%  proc-vmstat.pgfree
  31447 ± 11% -26.1%  23253 ± 13%  
sched_debug.cfs_rq:/.min_vruntime.max
   4241 ±  3% -12.2%   3724 ± 11%  
sched_debug.cfs_rq:/.min_vruntime.stddev
  20631 ± 11% -36.7%  13069 ± 29%  sched_debug.cfs_rq:/.spread0.max
   4238 ±  4% -12.1%   3724 ± 11%  
sched_debug.cfs_rq:/.spread0.stddev
 497105 ± 19% -16.0% 41 ±  4%  sched_debug.cpu.avg_idle.avg
  21199 ± 10% -12.0%  18650 ±  3%  
sched_debug.cpu.nr_load_updates.max
   2229 ± 10% -15.0%

Re: [PATCH] USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw

2018-03-22 Thread Xing Zhengjun



On 3/22/2018 8:03 PM, Greg KH wrote:

On Wed, Mar 21, 2018 at 01:29:42PM +0800, Zhengjun Xing wrote:

USB3 hubs don't support global suspend.

USB3 specification 10.10, Enhanced SuperSpeed hubs only support selective
suspend and resume, they do not support global suspend/resume where the
hub downstream facing ports states are not affected.

When system enters hibernation it first enters freeze process where only
the root hub enters suspend, usb_port_suspend() is not called for other
devices, and suspend status flags are not set for them. Other devices are
expected to suspend globally. Some external USB3 hubs will suspend the
downstream facing port at global suspend. These devices won't be resumed
at thaw as the suspend status flag is not set.

A USB3 removable hard disk connected through a USB3 hub that won't resume
at thaw will fail to synchronize SCSI cache, return “cmd cmplt err -71”
error, and needs a 60 seconds timeout which causing system hang for 60s
before the USB host reset the port for the USB3 removable hard disk to
recover.

Fix this by always calling usb_port_suspend() during freeze for USB3
devices.

This should go to the stable trees as well, right?

greg k-h

  Yes. It should go to the stable trees.


Re: [PATCH] USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw

2018-03-22 Thread Xing Zhengjun



On 3/22/2018 8:03 PM, Greg KH wrote:

On Wed, Mar 21, 2018 at 01:29:42PM +0800, Zhengjun Xing wrote:

USB3 hubs don't support global suspend.

USB3 specification 10.10, Enhanced SuperSpeed hubs only support selective
suspend and resume, they do not support global suspend/resume where the
hub downstream facing ports states are not affected.

When system enters hibernation it first enters freeze process where only
the root hub enters suspend, usb_port_suspend() is not called for other
devices, and suspend status flags are not set for them. Other devices are
expected to suspend globally. Some external USB3 hubs will suspend the
downstream facing port at global suspend. These devices won't be resumed
at thaw as the suspend status flag is not set.

A USB3 removable hard disk connected through a USB3 hub that won't resume
at thaw will fail to synchronize SCSI cache, return “cmd cmplt err -71”
error, and needs a 60 seconds timeout which causing system hang for 60s
before the USB host reset the port for the USB3 removable hard disk to
recover.

Fix this by always calling usb_port_suspend() during freeze for USB3
devices.

This should go to the stable trees as well, right?

greg k-h

  Yes. It should go to the stable trees.


RE: [PATCH 1/2] tracing: Handle NULL formats in hold_module_trace_bprintk_format()

2016-06-23 Thread Xing, Zhengjun
I agree with you. You can also add me to the "Signed-off-by".

Best Regards,
Zhengjun

-Original Message-
From: Steven Rostedt [mailto:rost...@goodmis.org] 
Sent: Monday, June 20, 2016 9:53 PM
To: linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torva...@linux-foundation.org>; Ingo Molnar 
<mi...@kernel.org>; Andrew Morton <a...@linux-foundation.org>; Xing, Zhengjun 
<zhengjun.x...@intel.com>; Namhyung Kim <namhy...@kernel.org>; 
sta...@vger.kernel.org
Subject: [PATCH 1/2] tracing: Handle NULL formats in 
hold_module_trace_bprintk_format()

From: "Steven Rostedt (Red Hat)" <rost...@goodmis.org>

If a task uses a non constant string for the format parameter in 
trace_printk(), then the trace_printk_fmt variable is set to NULL. This 
variable is then saved in the __trace_printk_fmt section.

The function hold_module_trace_bprintk_format() checks to see if duplicate 
formats are used by modules, and reuses them if so (saves them to the list if 
it is new). But this function calls lookup_format() that does a strcmp() to the 
value (which is now NULL) and can cause a kernel oops.

This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print 
when not using bprintk()") which added "__used" to the trace_printk_fmt 
variable, and before that, the kernel simply optimized it out (no NULL value 
was saved).

The fix is simply to handle the NULL pointer in lookup_format() and have the 
caller ignore the value if it was NULL.

Link: 
http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.x...@intel.com

Reported-by: xingzhen <zhengjun.x...@intel.com>
Acked-by: Namhyung Kim <namhy...@kernel.org>
Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using 
bprintk()")
Cc: sta...@vger.kernel.org # v3.5+
Signed-off-by: Steven Rostedt <rost...@goodmis.org>
---
 kernel/trace/trace_printk.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c index 
f96f0383f6c6..ad1d6164e946 100644
--- a/kernel/trace/trace_printk.c
+++ b/kernel/trace/trace_printk.c
@@ -36,6 +36,10 @@ struct trace_bprintk_fmt {  static inline struct 
trace_bprintk_fmt *lookup_format(const char *fmt)  {
struct trace_bprintk_fmt *pos;
+
+   if (!fmt)
+   return ERR_PTR(-EINVAL);
+
list_for_each_entry(pos, _bprintk_fmt_list, list) {
if (!strcmp(pos->fmt, fmt))
return pos;
@@ -57,7 +61,8 @@ void hold_module_trace_bprintk_format(const char **start, 
const char **end)
for (iter = start; iter < end; iter++) {
struct trace_bprintk_fmt *tb_fmt = lookup_format(*iter);
if (tb_fmt) {
-   *iter = tb_fmt->fmt;
+   if (!IS_ERR(tb_fmt))
+   *iter = tb_fmt->fmt;
continue;
}
 
--
2.8.0.rc3




RE: [PATCH 1/2] tracing: Handle NULL formats in hold_module_trace_bprintk_format()

2016-06-23 Thread Xing, Zhengjun
I agree with you. You can also add me to the "Signed-off-by".

Best Regards,
Zhengjun

-Original Message-
From: Steven Rostedt [mailto:rost...@goodmis.org] 
Sent: Monday, June 20, 2016 9:53 PM
To: linux-kernel@vger.kernel.org
Cc: Linus Torvalds ; Ingo Molnar 
; Andrew Morton ; Xing, Zhengjun 
; Namhyung Kim ; 
sta...@vger.kernel.org
Subject: [PATCH 1/2] tracing: Handle NULL formats in 
hold_module_trace_bprintk_format()

From: "Steven Rostedt (Red Hat)" 

If a task uses a non constant string for the format parameter in 
trace_printk(), then the trace_printk_fmt variable is set to NULL. This 
variable is then saved in the __trace_printk_fmt section.

The function hold_module_trace_bprintk_format() checks to see if duplicate 
formats are used by modules, and reuses them if so (saves them to the list if 
it is new). But this function calls lookup_format() that does a strcmp() to the 
value (which is now NULL) and can cause a kernel oops.

This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print 
when not using bprintk()") which added "__used" to the trace_printk_fmt 
variable, and before that, the kernel simply optimized it out (no NULL value 
was saved).

The fix is simply to handle the NULL pointer in lookup_format() and have the 
caller ignore the value if it was NULL.

Link: 
http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.x...@intel.com

Reported-by: xingzhen 
Acked-by: Namhyung Kim 
Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using 
bprintk()")
Cc: sta...@vger.kernel.org # v3.5+
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace_printk.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c index 
f96f0383f6c6..ad1d6164e946 100644
--- a/kernel/trace/trace_printk.c
+++ b/kernel/trace/trace_printk.c
@@ -36,6 +36,10 @@ struct trace_bprintk_fmt {  static inline struct 
trace_bprintk_fmt *lookup_format(const char *fmt)  {
struct trace_bprintk_fmt *pos;
+
+   if (!fmt)
+   return ERR_PTR(-EINVAL);
+
list_for_each_entry(pos, _bprintk_fmt_list, list) {
if (!strcmp(pos->fmt, fmt))
return pos;
@@ -57,7 +61,8 @@ void hold_module_trace_bprintk_format(const char **start, 
const char **end)
for (iter = start; iter < end; iter++) {
struct trace_bprintk_fmt *tb_fmt = lookup_format(*iter);
if (tb_fmt) {
-   *iter = tb_fmt->fmt;
+   if (!IS_ERR(tb_fmt))
+   *iter = tb_fmt->fmt;
continue;
}
 
--
2.8.0.rc3