Hello all,

I have been experimenting with the idea of leaving ftrace enabled, with sched 
events,
on production systems.

The main concern that I am having at the moment is about the impact on the 
system.
Enabling the sched events that I currently need for the tracing application
seems to slow down context-switches considerably, and make the system less 
responsive.

I have tested with cyclictest on the mainline kernel, and noticed an increase 
of min, avg latencies of around 25%.

Is this expected?

Some initial investigation into ftrace seems to point at the reservation and 
commit of the events into the ring buffer
as the highest sources of overhead, while event parameters copying, including 
COMM, does not seem to have any noticeable effect
relative to those costs.

I have been running 20 times the following test, and thrown away the first 
results:

$ sudo ./cyclictest --smp -p95 -m -s -N -l 100000 -q

$ uname -a
Linux claudio-HP-ProBook-470-G5 4.18.0-rc3+ #3 SMP Tue Jul 3 15:50:30 CEST 2018 
x86_64 x86_64 x86_64 GNU/Linux

For brevity, this is a comparison of one test's results. All other test results 
show the same ~25% increase.

On the left side, the run without ftrace sched events, on the right side with 
ftrace sched events enabled.

CPU    Count      Min       Act        Avg        Max        Count      
Min-ftrace Act-ftrace Avg-ftrace Max-ftrace
0     100000      2339       2936       2841     139478     100000       2900   
    3182       3566      93056
1      66742      2365       3386       2874      93639      66750       2959   
    3786       3646     154074
2      50080      2376       3058       2910     196221      50097       2997   
    4209       3655      18707
3      40076      2394       3461       2931      17914      40091       3006   
    4417       3750      17159
4      33404      2371       3612       2834      15336      33419       2997   
    3836       3594      23172
5      28635      2387       3313       2885      25863      28649       2995   
    3795       3647       9956
6      25058      2384       3428       2968      12162      25071       3051   
    4366       3719      18151
7      22275      2381       2859       2982      10706      22287       3046   
    5078       3825      10781

I would be thankful for any advice or comments on this,
especially with the goal in mind to lower as much as possible the runtime 
impact on the system.

Thank you,

Claudio Fontana

Reply via email to