Hi Michael, On Thu, 28 Feb 2013 14:38:03 +0800, Michael Wang wrote: > wake_affine() stuff is trying to bind related tasks closely, but it doesn't > work well according to the test on 'perf bench sched pipe' (thanks to Peter). > > Besides, pgbench show that blindly using wake_affine() will eat a lot of > performance. > > Thus, we need a new solution, it should detect the tasks related to each > other, bind them closely, take care the balance, latency and performance > at the same time. > > Feature wakeup buddy seems like a good solution (thanks to Mike for the hint). > > The feature introduced waker, wakee pointer and their ref count, along with > the new knob sysctl_sched_wakeup_buddy_ref. > > Now in select_task_rq_fair(), when current (task B) try to wakeup p (task A), > if match: > > 1. A->waker == B && A->wakee == B > 2. A->waker_ref > sysctl_sched_wakeup_buddy_ref > 3. A->wakee_ref > sysctl_sched_wakeup_buddy_ref > > then A is the wakeup buddy of B, which means A and B is likely to utilize > the memory of each other. > > Thus, if B is also the wakeup buddy of A, which means no other task has > destroyed their relationship, then A is likely to benefit from the cached > data of B, make them running closely is likely to gain benefit.
Not sure if it should require bidirectional relationship. Looks like just for benchmarks. Isn't there a one-way relationship that could get a benefit from this? I don't know ;-) Few nitpicks below.. > > This patch add the feature wakeup buddy, reorganized the logical of > wake_affine() stuff with the new feature, by doing these, pgbench and > 'perf bench sched pipe' perform better. > > Highlight: > Default value of sysctl_sched_wakeup_buddy_ref is 8 temporarily, > please let me know if some number perform better on your system, > I'd like to make it bigger to make the decision more carefully, > so we could provide the solution when it is really needed. > > Comments are very welcomed. > > Test: > Test with a 12 cpu X86 server and tip 3.8.0-rc7. > > 'perf bench sched pipe' show nearly double improvement. > > pgbench result: > prev post > > | db_size | clients | tps | | tps | > +---------+---------+-------+ +-------+ > | 22 MB | 1 | 10794 | | 10820 | > | 22 MB | 2 | 21567 | | 21915 | > | 22 MB | 4 | 41621 | | 42766 | > | 22 MB | 8 | 53883 | | 60511 | +12.30% > | 22 MB | 12 | 50818 | | 57129 | +12.42% > | 22 MB | 16 | 50463 | | 59345 | +17.60% > | 22 MB | 24 | 46698 | | 63787 | +36.59% > | 22 MB | 32 | 43404 | | 62643 | +44.33% > > | 7484 MB | 1 | 7974 | | 8014 | > | 7484 MB | 2 | 19341 | | 19534 | > | 7484 MB | 4 | 36808 | | 38092 | > | 7484 MB | 8 | 47821 | | 51968 | +8.67% > | 7484 MB | 12 | 45913 | | 52284 | +13.88% > | 7484 MB | 16 | 46478 | | 54418 | +17.08% > | 7484 MB | 24 | 42793 | | 56375 | +31.74% > | 7484 MB | 32 | 36329 | | 55783 | +53.55% > > | 15 GB | 1 | 7636 | | 7880 | > | 15 GB | 2 | 19195 | | 19477 | > | 15 GB | 4 | 35975 | | 37962 | > | 15 GB | 8 | 47919 | | 51558 | +7.59% > | 15 GB | 12 | 45397 | | 51163 | +12.70% > | 15 GB | 16 | 45926 | | 53912 | +17.39% > | 15 GB | 24 | 42184 | | 55343 | +31.19% > | 15 GB | 32 | 35983 | | 55358 | +53.84% > > Signed-off-by: Michael Wang <[email protected]> > --- [SNIP] > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 81fa536..d5acfd8 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3173,6 +3173,75 @@ static int wake_affine(struct sched_domain *sd, struct > task_struct *p, int sync) > } > > /* > + * Reduce sysctl_sched_wakeup_buddy_ref will reduce the preparation time > + * to active the wakeup buddy feature, and make it agile, however, this > + * will increase the risk of misidentify. > + * > + * Check wakeup_buddy() for the usage. > + */ > +unsigned int sysctl_sched_wakeup_buddy_ref = 8UL; It seems that just 8U (or even 8) is enough. > + > +/* > + * wakeup_buddy() help to check whether p1 is the wakeup buddy of p2. > + * > + * Return 1 for yes, 0 for no. > +*/ > +static inline int wakeup_buddy(struct task_struct *p1, struct task_struct > *p2) > +{ > + if (p1->waker != p2 || p1->wakee != p2) > + return 0; > + > + if (p1->waker_ref < sysctl_sched_wakeup_buddy_ref) > + return 0; > + > + if (p1->wakee_ref < sysctl_sched_wakeup_buddy_ref) > + return 0; > + > + return 1; > +} [SNIP] > @@ -3399,6 +3490,8 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, > int wake_flags) > unlock: > rcu_read_unlock(); > > + wakeup_ref(p); > + Why did you call it here? Shouldn't it be on somewhere in the ttwu? > return new_cpu; > } > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index c88878d..6845d24 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -424,6 +424,16 @@ static struct ctl_table kern_table[] = { > .extra1 = &one, > }, > #endif > +#ifdef CONFIG_SMP > + { > + .procname = "sched_wakeup_buddy_ref", > + .data = &sysctl_sched_wakeup_buddy_ref, > + .maxlen = sizeof(unsigned int), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax, > + .extra1 = &one, > + }, > +#endif > #ifdef CONFIG_PROVE_LOCKING > { > .procname = "prove_locking", -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

