On 10/10/16 13:29, Vincent Guittot wrote: > On 10 October 2016 at 12:01, Matt Fleming <m...@codeblueprint.co.uk> wrote: >> On Sun, 09 Oct, at 11:39:27AM, Wanpeng Li wrote: >>> >>> The difference between this patch and Peterz's is your patch have a >>> delta since activate_task()->enqueue_task() does do update_rq_clock(), >>> so why don't have the delta will cause low cpu machines (4 or 8) to >>> regress against your another reply in this thread? >> >> Both my patch and Peter's patch cause issues with low cpu machines. In >> <20161004201105.gp16...@codeblueprint.co.uk> I said, >> >> "This patch causes some low cpu machines (4 or 8) to regress. It turns >> out they regress with my patch too." >> >> Have I misunderstood your question? >> >> I ran out of time to investigate this last week, though I did try all >> proposed patches, including Vincent's, and none of them produced wins >> across the board. > > I have tried to reprocude your issue on my target an hikey board (ARM > based octo cores) but i failed to see a regression with commit > 7dc603c9028e. Neverthless, i can see tasks not been well spread
Wasn't this about the two patches mentioned in this thread? The one from Matt using 'se->sum_exec_runtime' in the if condition in enqueue_entity_load_avg() and Peterz's conditional call to update_rq_clock(rq) in enqueue_task()? > during fork as you mentioned. So I have studied a bit more the > spreading issue during fork last week and i have a new version of my > proposed patch that i'm going to send soon. With this patch, i can see > a good spread of tasks during the fork sequence and some kind of perf > improvement even if it's bit difficult as the variance is quite > important with hackbench test so it's mainly an improvement of > repeatability of the result Hikey (ARM64 2x4 cpus) board: cpufreq: performance, cpuidle: disabled Performance counter stats for 'perf bench sched messaging -g 20 -l 500' (10 runs): (1) tip/sched/core: commit 447976ef4fd0 5.902209533 seconds time elapsed ( +- 0.31% ) (2) tip/sched/core + original patch on the 'sched/fair: Do not decay new task load on first enqueue' thread (23/09/16) 5.919933030 seconds time elapsed ( +- 0.44% ) (3) tip/sched/core + Peter's ENQUEUE_NEW patch on the 'sched/fair: Do not decay new task load on first enqueue' thread (28/09/16) 5.970195534 seconds time elapsed ( +- 0.37% ) Not sure if we can call this a regression but it also shows no performance gain. >> >> I should get a bit further this week. >> >> Vincent, Dietmar, did you guys ever get around to submitting your PELT >> tracepoint patches? Getting some introspection into the scheduler's > > My tarcepoint are not in a shape to be submitted and would need a > cleanup as some are more hacks for debugging than real trace events. > Nevertheless, i can push them on a git branch if they can be useful > for someone We carry two trace events locally, one for PELT on se and one for cfs_rq's (I have to add the runnable bits here) which work for CONFIG_FAIR_GROUP_SCHED and !CONFIG_FAIR_GROUP_SCHED. I put them into __update_load_avg(), attach_entity_load_avg() and detach_entity_load_avg(). I could post them but so far mainline has been reluctant to see the need for PELT related trace events ... [...]