On Thu, Aug 18, 2011 at 11:40 PM, Srivatsa Bhat <[email protected]>wrote:
> > > On Thu, Aug 18, 2011 at 10:44 PM, Vaibhav Jain <[email protected]> wrote: > >> >> >> On Thu, Aug 18, 2011 at 9:02 AM, srivatsa bhat >> <[email protected]>wrote: >> >>> Hi Vaibhav, >>> >>> On Thu, Aug 18, 2011 at 8:24 PM, Vaibhav Jain <[email protected]>wrote: >>> >>>> Hi, >>>> >>>> I talked to a friend of mine and he suggested that >>>> in a logical offline state the cpu is powered on and ready to execute >>>> instructions >>>> just that the kernel is not aware of it. But in case of physical offline >>>> state the cpu >>>> is powered off and cannot run. >>>> Are you saying something similar ? >>>> >>>> Yes, you are right, mostly. >>> When you try to logically offline a CPU, the kernel will do task >>> migration (i.e., move out all the tasks running on that CPU to other CPUs in >>> the system) and it ensures that it doesn't need that CPU anymore. This also >>> means that, from now on, the context of that CPU need not be saved (because >>> the kernel has moved that CPU's tasks elsewhere). At this point, it is as if >>> the kernel is purposefully using only a subset of the available CPUs. This >>> step is a necessary prerequisite to do physical CPU offline later on. >>> >>> But I don't think CPU power ON or OFF is the differentiating factor >>> between logical and physical offlining. In logical offline, you still have >>> the CPUs in the system but you just tell the kernel not to use them. At this >>> stage, you can power off your CPU, to save power for example. >>> But in physical offline, from a software perspective, you do additional >>> work at the firmware level (apart from logical offlining at the OS level), >>> to ensure that physically plugging out the CPUs will not affect the running >>> system in any way. >>> >>> Please note that you can logically online and offline the same CPUs over >>> and over again without rebooting the system. Here, while onlining a CPU >>> which was offlined previously, the kernel follows almost the same sequence >>> which it normally follows while booting the CPUs during full system booting. >>> >>> Also one more thing to be noted is that, to be able to physically >>> hot-plug CPUs, apart from OS and firmware support, you also need the >>> hardware to support this feature. That is, the electrical wiring to the >>> individual CPUs should be such that plugging them in and out does not >>> interfere with the functioning of the rest of the system. As of today, there >>> are only a few systems that support physical CPU-hotplug. But you can do >>> logical CPU hotplug easily, by configuring the kernel appropriately during >>> compilation, as you have noted in one of your previous mails. >>> >>> Regards, >>> Srivatsa S. Bhat >>> >> >> >> Hi Srivatsa, >> >> That was great explanation! Thanks! >> I have just one more query. You mentioned above that " the kernel follows >> almost the same *sequence *which it normally follows while booting the >> CPUs during full system booting." >> >> Can you please explain this sequence a little ? >> >> > Hi Vaibhav, > > I'll try to outline a very high level view of what happens while booting an > SMP (Symmetric Multi-Processor) system. Instead of going through the entire > boot sequence, let me just highlight only the part that is of interest in > this discussion: booting multiple CPUs. > > The "boot processor" is the one which is booted first while booting a > system. On x86 architecture, CPU 0 is always the boot processor. Hence, if > you have observed, you cannot offline CPU0 using CPU hot-plugging on an x86 > machine. (On an Intel box, the file /sys/devices/system/cpu/cpu0/online is > purposefully absent, for this reason!). But in other architectures, this > might not be the case. For example on POWER architecture, any processor in > the system can act as the boot processor. > > Once the boot processor does its initialization, the other processors, > known as "secondary processors or application processors (APs)" are > booted/initialized. Here, obviously some synchronization mechanism is > necessary to ensure that this order is followed. So in Linux, we use 2 > bitmasks called "cpu_callout_mask" and "cpu_callin_mask". These bitmasks are > used to indicate the processors available in the system. > > Once the boot processor initializes itself, it updates cpu_callout_mask to > indicate which secondary processor (or application processor AP) can > initialize itself next (for example, the boot processor sets a particular > bit as 1 in the cpu_callout_mask). On the other hand, the secondary > processor would have done some very basic initialization till then and will > be testing the value of 'cpu_callout_mask' in a while loop to see if its > number has been "called out" by the boot processor. Only after the boot > processor "calls out" this AP, this AP will continue the rest of its > initialization and completes it. > > Once the AP completes its initialization, it reports back to the boot > processor by setting its number in the cpu_callin_mask. As expected, the > boot processor would have been waiting in a while loop on cpu_callin_mask to > see if this AP booted OK or not. Once it finds that the cpu_callin_mask for > this AP has been set, the boot processor follows the same procedure to boot > other APs: i.e., it updates cpu_callout_mask and waits for the corresponding > entry to be set in cpu_callin_mask by that AP and so on. This process > continues until all the APs are booted up. > > Of course, each of these "waiting" times (of both boot processor and APs) > are capped by some preset value, say for example 5 seconds. If some AP takes > more than that time to boot, the boot processor declares that the AP could > not boot and takes appropriate action (like clearing its bit in > cpu_callout_mask and logically removing that AP from its tables etc, > effectively forgetting about that processor). Similarly while the APs wait > for the boot processor to call them out, if the boot processor does not call > them within a given time period, they declare kernel panic. > > Here are some references, if you are interested in more details: > > Linux kernel source code: > 1. linux/arch/x86/kernel/smpboot.c : start_secondary() and smp_callin() > These are the functions executed by the APs (secondary or application > processors). Actually smp_callin() is called within start_secondary() which > is the primary function executed by APs. > > 2. linux/arch/x86/kernel/smpboot.c : do_boot_cpu() > This is executed by the boot processor. You can look up other important functions such as native_cpu_up(). General SMP booting info: 1. http://www.cheesecake.org/sac/smp.html [ Sorry, I accidentally sent the earlier mail before composing the text fully. ] Regards, Srivatsa S. Bhat
_______________________________________________ Kernelnewbies mailing list [email protected] http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
