>From: Thomas Gleixner [mailto:[email protected]]
>Sent: Wednesday, April 19, 2017 11:58 AM
>On Wed, 19 Apr 2017, Peter Zijlstra wrote:
>> On Tue, Apr 04, 2017 at 04:39:06PM +0000, Noam Camus wrote:
>> > Hi Peter & Vineet
>> >
>> > I wish to reduce boot time of my platform ARC/plat-eznps (4K CPUs).
>> > My analysis is that most boot time is spent over cpu_up() for all
>> > CPUs Measurements are about 66mS per CPU and Totally over 4 minutes (I got
>> > 800MHz cores).
>> >
>> > I see that smp_init() just iterate over all present cpus one by one.
>> > I wish to know if there was an attempt to optimize this with some parallel
>> > work?
>> >
>> > Are you aware of some method / trick that will help me to reduce boot
>> > time?
>> > Any suggestion how this can be done?
>>
>> So attempts have been made in the past but Thomas shot them down for
>> being gross hacks (they were).
>>
>> But Thomas has now (mostly) completed rewriting the CPU hotplug
>> machinery and he has at some point outlined means of achieving what
>> you're after.
>>
>> I've added him to Cc so he can correct me where I'm wrong, as I've not
>> looked into this in much detail after he mucked up all I knew about
>> CPU hotplug.
>>
>> Since each CPU is now responsible for its own bootstrap, we can now
>> kick all the CPUs awake without waiting for them to complete the
>> online stage.
>>
>> There might however be code that assumes CPUs come up one at a time,
>> so you'll need to audit for that. Its not going to be a trivial thing.
>There are a couple of things to consider.
>First of all we should make the whole 'kick CPU into life' and surrounding
>magic generic. Every arch has it's own handshake mechanism.
>That would look like this:
>Step BP AP
>0-9 [preparatory steps]
>10 [kick cpu into life (arch callback)]
>11 [Do initial arch bringup then
call in into a generic
function ]
>12 [handshake (generic)] [handshake (generic)]
>13 [more arch specific magic] [more arch specific
>magic]
>14-20 [ CPU starting ]
[ CPU goes online ]
>40 [ CPU active, hotplug done ]
>So the first step in parallelizing this would be:
> for_each_present_cpu(cpu)
> cpu_up(target_state = 10);
>i.e. make the allocations and whatever preparatory work needs to be done and
>kick the CPU into life. The target CPU would intialize the low level stuff and
>then call into a generic function, which does the generic initialization and
>then waits for the handshake.
>So the next thing would be:
> for_each_present_cpu(cpu)
> cpu_up(target_state = 40);
>This last step has to be single threaded for now because almost all CPU
>hotplug using facilities rely on the current serialization. There are also
>code pathes which use get_online_cpus() or cpu_hotplug_disable() to prevent
>interaction with cpu hotplug.
>The hotplug machinery is already designed so that after the handshake (#12/13]
>a plugged CPU can bring up itself completely alone, but due to the
>serialization expectations all over the place this won't work today.
>To make it work, you have to go through every single instance of CPU hotplug
>callback users and every single site which prevents hotplug via
get_online_cpus() or cpu_hotplug_disable() and audit them for concurrency
issues and fix them up.
>There might also be interaction required with the state machine, i.e. stop the
>state progress on a self plugging CPU between two steps to make serialization
>work.
What would be a good base to start on all above?
Would some formal release like v4.8 TAG good enough , or do I need to base on
some other specific HEAD (or TAG)?
Thanks,
Noam