On Thu, Aug 18, 2011 at 11:14 AM, Srivatsa Bhat <[email protected]>wrote:
> > > On Thu, Aug 18, 2011 at 11:40 PM, Srivatsa Bhat <[email protected] > > wrote: > >> >> >> On Thu, Aug 18, 2011 at 10:44 PM, Vaibhav Jain <[email protected]>wrote: >> >>> >>> >>> On Thu, Aug 18, 2011 at 9:02 AM, srivatsa bhat >>> <[email protected]>wrote: >>> >>>> Hi Vaibhav, >>>> >>>> On Thu, Aug 18, 2011 at 8:24 PM, Vaibhav Jain <[email protected]>wrote: >>>> >>>>> Hi, >>>>> >>>>> I talked to a friend of mine and he suggested that >>>>> in a logical offline state the cpu is powered on and ready to execute >>>>> instructions >>>>> just that the kernel is not aware of it. But in case of physical >>>>> offline state the cpu >>>>> is powered off and cannot run. >>>>> Are you saying something similar ? >>>>> >>>>> Yes, you are right, mostly. >>>> When you try to logically offline a CPU, the kernel will do task >>>> migration (i.e., move out all the tasks running on that CPU to other CPUs >>>> in >>>> the system) and it ensures that it doesn't need that CPU anymore. This also >>>> means that, from now on, the context of that CPU need not be saved (because >>>> the kernel has moved that CPU's tasks elsewhere). At this point, it is as >>>> if >>>> the kernel is purposefully using only a subset of the available CPUs. This >>>> step is a necessary prerequisite to do physical CPU offline later on. >>>> >>>> But I don't think CPU power ON or OFF is the differentiating factor >>>> between logical and physical offlining. In logical offline, you still have >>>> the CPUs in the system but you just tell the kernel not to use them. At >>>> this >>>> stage, you can power off your CPU, to save power for example. >>>> But in physical offline, from a software perspective, you do additional >>>> work at the firmware level (apart from logical offlining at the OS level), >>>> to ensure that physically plugging out the CPUs will not affect the running >>>> system in any way. >>>> >>>> Please note that you can logically online and offline the same CPUs over >>>> and over again without rebooting the system. Here, while onlining a CPU >>>> which was offlined previously, the kernel follows almost the same sequence >>>> which it normally follows while booting the CPUs during full system >>>> booting. >>>> >>>> Also one more thing to be noted is that, to be able to physically >>>> hot-plug CPUs, apart from OS and firmware support, you also need the >>>> hardware to support this feature. That is, the electrical wiring to the >>>> individual CPUs should be such that plugging them in and out does not >>>> interfere with the functioning of the rest of the system. As of today, >>>> there >>>> are only a few systems that support physical CPU-hotplug. But you can do >>>> logical CPU hotplug easily, by configuring the kernel appropriately during >>>> compilation, as you have noted in one of your previous mails. >>>> >>>> Regards, >>>> Srivatsa S. Bhat >>>> >>> >>> >>> Hi Srivatsa, >>> >>> That was great explanation! Thanks! >>> I have just one more query. You mentioned above that " the kernel >>> follows almost the same *sequence *which it normally follows while >>> booting the CPUs during full system booting." >>> >>> Can you please explain this sequence a little ? >>> >>> >> Hi Vaibhav, >> >> I'll try to outline a very high level view of what happens while booting >> an SMP (Symmetric Multi-Processor) system. Instead of going through the >> entire boot sequence, let me just highlight only the part that is of >> interest in this discussion: booting multiple CPUs. >> >> The "boot processor" is the one which is booted first while booting a >> system. On x86 architecture, CPU 0 is always the boot processor. Hence, if >> you have observed, you cannot offline CPU0 using CPU hot-plugging on an x86 >> machine. (On an Intel box, the file /sys/devices/system/cpu/cpu0/online is >> purposefully absent, for this reason!). But in other architectures, this >> might not be the case. For example on POWER architecture, any processor in >> the system can act as the boot processor. >> >> Once the boot processor does its initialization, the other processors, >> known as "secondary processors or application processors (APs)" are >> booted/initialized. Here, obviously some synchronization mechanism is >> necessary to ensure that this order is followed. So in Linux, we use 2 >> bitmasks called "cpu_callout_mask" and "cpu_callin_mask". These bitmasks are >> used to indicate the processors available in the system. >> >> Once the boot processor initializes itself, it updates cpu_callout_mask to >> indicate which secondary processor (or application processor AP) can >> initialize itself next (for example, the boot processor sets a particular >> bit as 1 in the cpu_callout_mask). On the other hand, the secondary >> processor would have done some very basic initialization till then and will >> be testing the value of 'cpu_callout_mask' in a while loop to see if its >> number has been "called out" by the boot processor. Only after the boot >> processor "calls out" this AP, this AP will continue the rest of its >> initialization and completes it. >> >> Once the AP completes its initialization, it reports back to the boot >> processor by setting its number in the cpu_callin_mask. As expected, the >> boot processor would have been waiting in a while loop on cpu_callin_mask to >> see if this AP booted OK or not. Once it finds that the cpu_callin_mask for >> this AP has been set, the boot processor follows the same procedure to boot >> other APs: i.e., it updates cpu_callout_mask and waits for the corresponding >> entry to be set in cpu_callin_mask by that AP and so on. This process >> continues until all the APs are booted up. >> >> Of course, each of these "waiting" times (of both boot processor and APs) >> are capped by some preset value, say for example 5 seconds. If some AP takes >> more than that time to boot, the boot processor declares that the AP could >> not boot and takes appropriate action (like clearing its bit in >> cpu_callout_mask and logically removing that AP from its tables etc, >> effectively forgetting about that processor). Similarly while the APs wait >> for the boot processor to call them out, if the boot processor does not call >> them within a given time period, they declare kernel panic. >> >> Here are some references, if you are interested in more details: >> >> Linux kernel source code: >> 1. linux/arch/x86/kernel/smpboot.c : start_secondary() and smp_callin() >> These are the functions executed by the APs (secondary or application >> processors). Actually smp_callin() is called within start_secondary() which >> is the primary function executed by APs. >> >> 2. linux/arch/x86/kernel/smpboot.c : do_boot_cpu() >> > This is executed by the boot processor. You can look up other > important functions such as native_cpu_up(). > > General SMP booting info: > 1. http://www.cheesecake.org/sac/smp.html > > [ Sorry, I accidentally sent the earlier mail before composing the text > fully. ] > > Regards, > Srivatsa S. Bhat > Awesome explanation Srivatsa!! Thanks a lot!! Just had one more doubt. I am a little unclear about how the APs get initialized in the beginning. In the case of Boot Processor its just like a uniprocessor system. But how do the APs start executing code ? Could you please explain a little ? Thanx!!
_______________________________________________ Kernelnewbies mailing list [email protected] http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
