Hi Peter/Ingo and all,

With the advent of more cores and heterogeneous architectures, the scheduler is 
required to be more complex (power efficiency) and diverse (big.little). For 
the scheduler to address that challenge as a whole, it is costly but not 
necessary. This proposal argues that the scheduler be spitted into two parts: 
top half (task scheduling) and bottom half (load balance). Let the bottom half 
take charge of the incoming requirements.

The two halves are rather orthogonal in functionality. The task scheduling (top 
half) seeks for *ONE* CPU to execute running tasks fairly (priority included), 
while the load balance (bottom half) aims for *ALL* CPUs to maximize the 
throughput of the computing power. The goal of task scheduling is pretty unique 
and clear, and CFS and RT in that part are exactly approaching the goal. The 
load balance, however, is constrained to meet more goals, to name a few, 
performance (throughput/responsiveness), power consumption, architecture 
differences, etc. Those things are often hard to achieve because they may 
conflict and are difficult to estimate and plan. So, shall we declare the 
independence of the two, give them freedom to pursue their own "happiness".

We take an incremental development method. As a starting point, we did three 
things (but did not change one single line of real-work code):
        1)      Remove load balance from fair.c into load_balance.c (~3000 
lines of codes). As a result, fair.c/rt.c and load_balance.c have very little 
intersection.
        2)      Define struct sched_lb_class that consists of the following 
members to umbrella the load balance entry points.
                a.      const struct sched_lb_class *next;
                b.      int (*fork_balance) (struct task_struct *p, int 
sd_flags, int wake_flags);
                c.      int (*exec_balance) (struct task_struct *p, int 
sd_flags, int wake_flags);
                d.      int (*wakeup_balance) (struct task_struct *p, int 
sd_flags, int wake_flags);
                e.      void (*idle_balance) (int this_cpu, struct rq *this_rq);
                f.      void (*periodic_rebalance) (int cpu, enum cpu_idle_type 
idle);
                g.      void (*nohz_idle_balance) (int this_cpu, enum 
cpu_idle_type idle);
                h.      void (*start_periodic_balance) (struct rq *rq, int cpu);
                i.      void (*check_nohz_idle_balance) (struct rq *rq, int 
cpu);
        3)      Insert another layer of indirection to wrap the implemented 
functions in sched_lb_class. Implement a default load balance class that is 
just the previous load balance.

The next to do is to continue redesigning and refactoring to make life easier 
toward more powerful and diverse load balance. And more importantly, this RFC 
solicits a discussion to get early feedback on the big proposed change.

Thanks,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to