Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
Hi! During loading, a hash table mapped from destination page to source page is used instead of original linear mapping implementation. Because the hibernated image may be very large (up to near the size of physical memory), it is very time-consuming to search a source page given the destination page, which is used to check whether an newly allocated page is in the range of allocated destination pages. This seems to be an optimization of kexec so that it becomes efficient in loading large images (containing large number of segments). Probably this can be a separate patch. IMHO, we can just first write a minimal patch where one can just switch between kernels. Once that patch is upstream, we can enhance Yes, please. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Fri, May 16, 2008 at 12:52:48PM +0800, Huang, Ying wrote: On Thu, 2008-05-15 at 19:55 -0700, Eric W. Biederman wrote: Huang, Ying [EMAIL PROTECTED] writes: The disadvantage of this solution is that kernel B must know it is original kernel (A) or kexeced kernel (B). Different code should be used by kernel A and kernel B. And after jump from A to B, jump from B to A, when jump from A to B again, kernel A must use different code from the first time. I don't know what the case is for keeping two kernels in memory and switching between them. This can be used to save the memory image of kernel B and accelerate the hibernation. The real boot of kernel B is only needed first time. I suspect a small piece of trampoline code between the two kernels could handle the case. (i.e. purgatory pays attention). That is a fundamental aspect of the design. A general purpose infrastructure with trampoline code to adapt it to whatever situation comes up. It is possible to use purgatory to deal with this problem. Jump from kernel A to kernel B Jump to entry of purgatory (purgatory_entry) purgatory save the return address (kexec_jump_back_entry_A) Purgatory set kexec_jump_back_entry for kernel B to a code segment in purgatory, say kexec_jump_back_entry_A_for_B Purgatory jump to entry point of kernel B Jump from kernel B to kernel A Jump to purgatory (kexec_jump_back_entry_A_for_B) Purgatory save the return address (kexec_jump_back_entry_B) Purgatory return to kernel A (kexec_jump_back_entry_A) Jump from kernel A to kernel B again Jump to entry of purgatory (purgatory_entry) Purgatory save the return address (kexec_jump_back_entry_A) Purgatory jump to kexec_jump_back_entry_B The disadvantage of this solution is that some information is saved in purgatory (kexec_jump_back_entry_A, kexec_jump_back_entry_B). So, purgatory must be saved too when save the memory image of kernel A or kernel B. Purgatory can be seen as a part of kernel B. But it is a little tricky to think it as a part of kernel A too. That's a good point. Remembering the actual return points in purgatory will require purgatory to be saved along with core file. I think, purgatory is a good infrastructure for transitions between the kernels but at the same time, here it is a matter of just making a call and then inspecting the stack in kexec_jump_back_entry. IMHO, we can keep it simple and not involving purgatory in later transitions. Thanks Vivek ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Thu, May 15, 2008 at 11:27:58PM -0400, Vivek Goyal wrote: On Fri, May 16, 2008 at 10:56:15AM +0800, Huang, Ying wrote: On Thu, 2008-05-15 at 19:25 -0700, Eric W. Biederman wrote: Huang, Ying [EMAIL PROTECTED] writes: On Thu, 2008-05-15 at 11:39 -0700, Eric W. Biederman wrote: [...] 2) After we figure out our address read the stack pointer from a fixed location and simply set it. (This is my preference) Just for confirmation (My English is poor). Do you mean that kernel A just read the stack top as re-entry point, regardless of whether it is return address or argument 1? What I was thinking was: In kernel A() relocate_new_kernel: ... call *%eax kexec_jump_back_entry: /* This code should be PIC so figure out where we are */ call 1f 1: popl %edi subl $(1b - relocate_kernel), %edi /* Setup a safe stack */ lealPAGE_SIZE(%edi), %esp ... Then in purgatory we can read the address of kexec_jump_back_entry by examining 0(%esp) and export it in whatever fashion is sane. However we reach kexec_jump_back_entry we should be fine. Huang is making use of purgatory only for booting kernel B for the first time. Once the kernel B is booted, all the trasitions (A--B and B--A) happen without using purgatory. Just keep on jumping back and forth to kexec_jump_back_entry. Probably not using purgatory for later transitions is justified as long as kernel code is simple and small. Otherwise we will shall have to teach purgatory also of special case of resuming kernel B or booting kernel B. I think it is reasonable to enable jumping back and forth more than one time. So the following should be possible: 1. Jump from A to B (actually jump to purgatory, trigger the boot of B) 2. Jump from B to A 3. Jump from A to B again (jump to the kexec_jump_back_entry of B) 4. Jump from B to A ... So it should be possible to get the re-entry point of kernel B in kexec_jump_back_entry of kernel A too. So I think in kexec_jump_back_entry, the caller's stack should be checked to get re-entry point of peer. And the stack state is different depend on where come from, from relocate_new_kernel() or return. To me this idea also looks good. So control flow will look something as follows? relocate_new kernel: if (!preserve_context) set registers to known state. jump to purgatory. else goto jump-back-setup: jump-back-setup: - Color the stack. move $0x 0(%esp) - call %edx Thinking more about it, probably we don't have to separate out preserve context and normal kexec path. Both can transition to purgatory using call %edx. Coloring the stack should not harm in normal kexec. Thanks Vivek ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Thu, May 15, 2008 at 12:57:53PM +0800, Huang, Ying wrote: On Wed, 2008-05-14 at 14:43 -0700, Eric W. Biederman wrote: [...] Then as a preliminary design let's plan on this. - Pass the rentry point as the return address (using the C ABI). We may want to load the stack pointer etc so we can act as a direct entry point for new code. There are some issues about passing entry point as return address. The kexec jump (or kexec with return) is used for - Switching between original kernel (A) and kexeced kernel (B) - Call some code (such as BIOS code) in physical mode 1) When call some code in physical mode, the called code can use a simple return to return to kernel A. So there is no return address on stack after return to kernel A. Instead, argument 1 is on stack top. 2) When switch back from kernel B to kernel A, kernel B will call the jump back entry of kernel A with C ABI. So, the return address is on stack top. And kernel A get jump back entry of kernel B via the return address. Because the stack state is different between 1) and 2), the jump back entry of kernel A should distinguish them. Possible solution can be as follow: a) Before kernel A call some physical mode code or kernel B, it set argument 1 to be a magic number that can not be return address (such as -1). Jump back entry of kernel A can check whether the stack top is argument 1 or return address. b) Distinguish by return address. Such as, called physical mode code must return 0, while kernel B must set %eax to some other number. IMHO, this kind of make more sense to me when keeping C function like semantics in mind. Both the cases can be treated like calls to functions (calling BIOS function and jumping to kernel B). The basic difference between two cases is the re-entry point. In BIOS function case, we always re-enter the function at the start but in case of kernel B, except first entry, all other entries happen at a run time determined address, which needs to be communicated to kernel A. I would think that second kernel B just should execute ret and new entry address of kernel B is passed to kernel A through %eax (return value of function). Not sure if BIOS routines can always return a fix code so that we can differentiate between two cases. Thanks Vivek ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Thu, 2008-05-15 at 22:00 -0400, Vivek Goyal wrote: [...] IMHO, this kind of make more sense to me when keeping C function like semantics in mind. Both the cases can be treated like calls to functions (calling BIOS function and jumping to kernel B). The basic difference between two cases is the re-entry point. In BIOS function case, we always re-enter the function at the start but in case of kernel B, except first entry, all other entries happen at a run time determined address, which needs to be communicated to kernel A. I would think that second kernel B just should execute ret and new entry address of kernel B is passed to kernel A through %eax (return value of function). The disadvantage of this solution is that kernel B must know it is original kernel (A) or kexeced kernel (B). Different code should be used by kernel A and kernel B. And after jump from A to B, jump from B to A, when jump from A to B again, kernel A must use different code from the first time. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
Huang, Ying [EMAIL PROTECTED] writes: On Thu, 2008-05-15 at 11:39 -0700, Eric W. Biederman wrote: [...] 2) After we figure out our address read the stack pointer from a fixed location and simply set it. (This is my preference) Just for confirmation (My English is poor). Do you mean that kernel A just read the stack top as re-entry point, regardless of whether it is return address or argument 1? What I was thinking was: In kernel A() relocate_new_kernel: ... call*%eax kexec_jump_back_entry: /* This code should be PIC so figure out where we are */ call1f 1: popl%edi subl$(1b - relocate_kernel), %edi /* Setup a safe stack */ lealPAGE_SIZE(%edi), %esp ... Then in purgatory we can read the address of kexec_jump_back_entry by examining 0(%esp) and export it in whatever fashion is sane. However we reach kexec_jump_back_entry we should be fine. Eric ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Thu, 2008-05-15 at 19:25 -0700, Eric W. Biederman wrote: Huang, Ying [EMAIL PROTECTED] writes: On Thu, 2008-05-15 at 11:39 -0700, Eric W. Biederman wrote: [...] 2) After we figure out our address read the stack pointer from a fixed location and simply set it. (This is my preference) Just for confirmation (My English is poor). Do you mean that kernel A just read the stack top as re-entry point, regardless of whether it is return address or argument 1? What I was thinking was: In kernel A() relocate_new_kernel: ... call *%eax kexec_jump_back_entry: /* This code should be PIC so figure out where we are */ call 1f 1: popl %edi subl $(1b - relocate_kernel), %edi /* Setup a safe stack */ lealPAGE_SIZE(%edi), %esp ... Then in purgatory we can read the address of kexec_jump_back_entry by examining 0(%esp) and export it in whatever fashion is sane. However we reach kexec_jump_back_entry we should be fine. I think it is reasonable to enable jumping back and forth more than one time. So the following should be possible: 1. Jump from A to B (actually jump to purgatory, trigger the boot of B) 2. Jump from B to A 3. Jump from A to B again (jump to the kexec_jump_back_entry of B) 4. Jump from B to A ... So it should be possible to get the re-entry point of kernel B in kexec_jump_back_entry of kernel A too. So I think in kexec_jump_back_entry, the caller's stack should be checked to get re-entry point of peer. And the stack state is different depend on where come from, from relocate_new_kernel() or return. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
Huang, Ying [EMAIL PROTECTED] writes: The disadvantage of this solution is that kernel B must know it is original kernel (A) or kexeced kernel (B). Different code should be used by kernel A and kernel B. And after jump from A to B, jump from B to A, when jump from A to B again, kernel A must use different code from the first time. I don't know what the case is for keeping two kernels in memory and switching between them. I suspect a small piece of trampoline code between the two kernels could handle the case. (i.e. purgatory pays attention). That is a fundamental aspect of the design. A general purpose infrastructure with trampoline code to adapt it to whatever situation comes up. Eric ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Fri, May 16, 2008 at 10:56:15AM +0800, Huang, Ying wrote: On Thu, 2008-05-15 at 19:25 -0700, Eric W. Biederman wrote: Huang, Ying [EMAIL PROTECTED] writes: On Thu, 2008-05-15 at 11:39 -0700, Eric W. Biederman wrote: [...] 2) After we figure out our address read the stack pointer from a fixed location and simply set it. (This is my preference) Just for confirmation (My English is poor). Do you mean that kernel A just read the stack top as re-entry point, regardless of whether it is return address or argument 1? What I was thinking was: In kernel A() relocate_new_kernel: ... call*%eax kexec_jump_back_entry: /* This code should be PIC so figure out where we are */ call1f 1: popl%edi subl$(1b - relocate_kernel), %edi /* Setup a safe stack */ lealPAGE_SIZE(%edi), %esp ... Then in purgatory we can read the address of kexec_jump_back_entry by examining 0(%esp) and export it in whatever fashion is sane. However we reach kexec_jump_back_entry we should be fine. Huang is making use of purgatory only for booting kernel B for the first time. Once the kernel B is booted, all the trasitions (A--B and B--A) happen without using purgatory. Just keep on jumping back and forth to kexec_jump_back_entry. Probably not using purgatory for later transitions is justified as long as kernel code is simple and small. Otherwise we will shall have to teach purgatory also of special case of resuming kernel B or booting kernel B. I think it is reasonable to enable jumping back and forth more than one time. So the following should be possible: 1. Jump from A to B (actually jump to purgatory, trigger the boot of B) 2. Jump from B to A 3. Jump from A to B again (jump to the kexec_jump_back_entry of B) 4. Jump from B to A ... So it should be possible to get the re-entry point of kernel B in kexec_jump_back_entry of kernel A too. So I think in kexec_jump_back_entry, the caller's stack should be checked to get re-entry point of peer. And the stack state is different depend on where come from, from relocate_new_kernel() or return. To me this idea also looks good. So control flow will look something as follows? relocate_new kernel: if (!preserve_context) set registers to known state. jump to purgatory. else goto jump-back-setup: jump-back-setup: - Color the stack. move $0x 0(%esp) - call %edx kexec_jump_back_entry: - If 0 (%esp) is not -1 image-start = 0(%esp) //Re entry point of kernel B. Store it. else We returned from BIOS call. Re-entry point has not changed Do nothing. - Continue to resume kernel A Thanks Vivek ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
Huang, Ying [EMAIL PROTECTED] writes: I think it is reasonable to enable jumping back and forth more than one time. I'm not opposed. I just don't understand the utility yet. So the following should be possible: 1. Jump from A to B (actually jump to purgatory, trigger the boot of B) 2. Jump from B to A 3. Jump from A to B again (jump to the kexec_jump_back_entry of B) (And we go through purgatory which remembers the kexec_jump_back_entry of B) 4. Jump from B to A ... So it should be possible to get the re-entry point of kernel B in kexec_jump_back_entry of kernel A too. So I think in kexec_jump_back_entry, the caller's stack should be checked to get re-entry point of peer. And the stack state is different depend on where come from, from relocate_new_kernel() or return. Yes. Any conditional logic needs to be in purgatory or a similar trampoline. Eric ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
Huang, Ying [EMAIL PROTECTED] writes: So, IMHO, for first simple implementation, we don't have to pass around any data between kernels except entry point. (Please correct me if I am wrong). Lets get that implementation in first and then we can get rest of the pieces in place. Yes. Kernel entry/re-entry point is the only information need to be communicated between kernels for just switching between them. So we can focus on kexec jump patch firstly. Then as a preliminary design let's plan on this. - Pass the rentry point as the return address (using the C ABI). We may want to load the stack pointer etc so we can act as a direct entry point for new code. - Look at passing a pointer to the mapping of pages that the kexec trampoline uses in arg1 of the C ABI. Largely the format is defacto fixed anyway because we need to pass the structure from C to assembly. Using the standard C ABI makes things much it much easier to pick a calling convention, and to document it. Eric ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Wed, 2008-05-14 at 14:43 -0700, Eric W. Biederman wrote: [...] Then as a preliminary design let's plan on this. - Pass the rentry point as the return address (using the C ABI). We may want to load the stack pointer etc so we can act as a direct entry point for new code. There are some issues about passing entry point as return address. The kexec jump (or kexec with return) is used for - Switching between original kernel (A) and kexeced kernel (B) - Call some code (such as BIOS code) in physical mode 1) When call some code in physical mode, the called code can use a simple return to return to kernel A. So there is no return address on stack after return to kernel A. Instead, argument 1 is on stack top. 2) When switch back from kernel B to kernel A, kernel B will call the jump back entry of kernel A with C ABI. So, the return address is on stack top. And kernel A get jump back entry of kernel B via the return address. Because the stack state is different between 1) and 2), the jump back entry of kernel A should distinguish them. Possible solution can be as follow: a) Before kernel A call some physical mode code or kernel B, it set argument 1 to be a magic number that can not be return address (such as -1). Jump back entry of kernel A can check whether the stack top is argument 1 or return address. b) Distinguish by return address. Such as, called physical mode code must return 0, while kernel B must set %eax to some other number. c) Use different entry point for 1) and 2). Two entry points are deduced from return address. Such as: entry1 = return_address; entry2 = return_address ~0xfff; /* page aligned */ entry1 is used by physical mode code. entry2 is used by kernel B. Which one is better? Or some other solution? Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
Hi, Vivek, On Tue, 2008-05-13 at 01:34 -0400, Vivek Goyal wrote: On Mon, May 12, 2008 at 02:40:41PM +0800, Huang, Ying wrote: This patch implements a prototype of kexec multi-stage load. With this patch, the backup pages map can be passed to kexeced kernel via /sbin/kexec; and the sys_kexec_load can be used to load large hibernated image with huge number of segments. Hi Huang, Had a quick look at the patch. Will review in detail soon. Had few thoughts. In general, these patches are on top of previous kexec jump patches. It would be good if you could repost your updated patches so that I can apply the patches and and get some testing going. The kexec jump patch v9 is sufficient for this patch to work. I have no new version of kexec jump patch so far. Last time I tried the patches (V9) and kexec jump did not work for me. I was not getting timer interrupts in second kernel. Then I had to put LAPIC and IOAPIC in legacy mode and then at one way jump started working. I am not sure how the next kernel boots for you without putting APICs in legacy mode. (Yet to make returning back to original kernel work using V9). Can normal kexec (without kexec jump) works without putting LAPIC and IOAPIC in legacy mode? Does this mean we should put LAPIC and IOAPIC into legacy mode before kexec and restore them after? The kexec jump patch works well on my IBM T42. But it seems that the IOAPIC is disabled in BIOS, so I can only use i8259 and LAPIC on this machine. In kexec based hibernation, resuming from disk is implemented as loading the hibernated disk image with sys_kexec_load(). But unlike the normal kexec load, the hibernated image may have huge number of segments. So multi-stage loading is necessary for kexec load based resuming from disk implementation. I understand that hibernated images are huge. But why do we require multi stage loading? I knew there was a maximum segment limit in kexec. But I think we can change that limit. Anything else prevents us from loading large images in one go? There are two reason for multi-stage loading: - Pass backup pages map from original kernel (A) to kexeced kernel (B), because it is not known before loading. We have discussed this before in: http://lkml.org/lkml/2008/3/12/308 http://lkml.org/lkml/2008/3/14/59 http://lkml.org/lkml/2008/3/21/299 - Load large hibernated image. The hibernated image can be not only large but also discontinuous. For example, the physical memory size is 4G, and there is one free page every 2 pages, that is, there will be nearly 2G segments. Loading these segments in one go is impossible. So multi-stage load is necessary. And if the hibernated image is compressed, it is also very difficult to load it in one go because the anonymous pages needed. And, multi-stage loading is also necessary for parameter passing from original kernel to kexeced kernel because some information such as backup pages map is not available before loading. Four stages are defined: - KS_start: start stage; begin a new kexec loading; there must be only one KS_start stage in one kexec loading. - KS_mid: middle stage; continue load some segments; there may be many or zero KS_mid stages in one kexec loading; follows a KS_start or KS_mid stage. - KS_final: final stage; finish a kexec loading; there must be only one KS_final stage in one kexec loading; follows a KS_start or KS_mid stage. - KS_full: back compatible with original loading semantics, finish all work of a kexec loading in one KS_full stage. Overlapping between pages of different segments is allowed to support parameter passing. During loading, a hash table mapped from destination page to source page is used instead of original linear mapping implementation. Because the hibernated image may be very large (up to near the size of physical memory), it is very time-consuming to search a source page given the destination page, which is used to check whether an newly allocated page is in the range of allocated destination pages. This seems to be an optimization of kexec so that it becomes efficient in loading large images (containing large number of segments). Probably this can be a separate patch. If it is desired, I can separate it into another patch. IMHO, we can just first write a minimal patch where one can just switch between kernels. Once that patch is upstream, we can enhance it to do the hibernation and saving core functionality. Incremental review becomes easier. Your last patch (v9) was a good attempt at that and I thought very soon we shall have something mergable. Agreed. We can first focus on kexec jump patch. But as in last thread of kexec jump (v9), we need a protocol for parameter passing between kernel A and kernel B. So, we can use this patch as a prototype for the communication protocol. The original mapping is only used by
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Wed, May 14, 2008 at 09:57:46AM +0800, Huang, Ying wrote: Hi, Vivek, On Tue, 2008-05-13 at 01:34 -0400, Vivek Goyal wrote: On Mon, May 12, 2008 at 02:40:41PM +0800, Huang, Ying wrote: This patch implements a prototype of kexec multi-stage load. With this patch, the backup pages map can be passed to kexeced kernel via /sbin/kexec; and the sys_kexec_load can be used to load large hibernated image with huge number of segments. Hi Huang, Had a quick look at the patch. Will review in detail soon. Had few thoughts. In general, these patches are on top of previous kexec jump patches. It would be good if you could repost your updated patches so that I can apply the patches and and get some testing going. The kexec jump patch v9 is sufficient for this patch to work. I have no new version of kexec jump patch so far. Last time I tried the patches (V9) and kexec jump did not work for me. I was not getting timer interrupts in second kernel. Then I had to put LAPIC and IOAPIC in legacy mode and then at one way jump started working. I am not sure how the next kernel boots for you without putting APICs in legacy mode. (Yet to make returning back to original kernel work using V9). Can normal kexec (without kexec jump) works without putting LAPIC and IOAPIC in legacy mode? Does this mean we should put LAPIC and IOAPIC into legacy mode before kexec and restore them after? We do put LAPIC and IOAPIC in legacy mode in normal kexec. Look at disable_IO_APIC() in native_machine_shutdown(). So I think we shall have to do the same thing in kexec jump code too. The kexec jump patch works well on my IBM T42. But it seems that the IOAPIC is disabled in BIOS, so I can only use i8259 and LAPIC on this machine. In kexec based hibernation, resuming from disk is implemented as loading the hibernated disk image with sys_kexec_load(). But unlike the normal kexec load, the hibernated image may have huge number of segments. So multi-stage loading is necessary for kexec load based resuming from disk implementation. I understand that hibernated images are huge. But why do we require multi stage loading? I knew there was a maximum segment limit in kexec. But I think we can change that limit. Anything else prevents us from loading large images in one go? There are two reason for multi-stage loading: - Pass backup pages map from original kernel (A) to kexeced kernel (B), because it is not known before loading. We have discussed this before in: http://lkml.org/lkml/2008/3/12/308 http://lkml.org/lkml/2008/3/14/59 http://lkml.org/lkml/2008/3/21/299 See my response below - Load large hibernated image. The hibernated image can be not only large but also discontinuous. For example, the physical memory size is 4G, and there is one free page every 2 pages, that is, there will be nearly 2G segments. Loading these segments in one go is impossible. So multi-stage load is necessary. And if the hibernated image is compressed, it is also very difficult to load it in one go because the anonymous pages needed. And, multi-stage loading is also necessary for parameter passing from original kernel to kexeced kernel because some information such as backup pages map is not available before loading. Four stages are defined: - KS_start: start stage; begin a new kexec loading; there must be only one KS_start stage in one kexec loading. - KS_mid: middle stage; continue load some segments; there may be many or zero KS_mid stages in one kexec loading; follows a KS_start or KS_mid stage. - KS_final: final stage; finish a kexec loading; there must be only one KS_final stage in one kexec loading; follows a KS_start or KS_mid stage. - KS_full: back compatible with original loading semantics, finish all work of a kexec loading in one KS_full stage. Overlapping between pages of different segments is allowed to support parameter passing. During loading, a hash table mapped from destination page to source page is used instead of original linear mapping implementation. Because the hibernated image may be very large (up to near the size of physical memory), it is very time-consuming to search a source page given the destination page, which is used to check whether an newly allocated page is in the range of allocated destination pages. This seems to be an optimization of kexec so that it becomes efficient in loading large images (containing large number of segments). Probably this can be a separate patch. If it is desired, I can separate it into another patch. IMHO, we can just first write a minimal patch where one can just switch between kernels. Once that patch is upstream, we can enhance it to do the hibernation and saving core functionality. Incremental review becomes easier. Your
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Tue, 2008-05-13 at 22:56 -0400, Vivek Goyal wrote: [...] Last time I tried the patches (V9) and kexec jump did not work for me. I was not getting timer interrupts in second kernel. Then I had to put LAPIC and IOAPIC in legacy mode and then at one way jump started working. I am not sure how the next kernel boots for you without putting APICs in legacy mode. (Yet to make returning back to original kernel work using V9). Can normal kexec (without kexec jump) works without putting LAPIC and IOAPIC in legacy mode? Does this mean we should put LAPIC and IOAPIC into legacy mode before kexec and restore them after? We do put LAPIC and IOAPIC in legacy mode in normal kexec. Look at disable_IO_APIC() in native_machine_shutdown(). So I think we shall have to do the same thing in kexec jump code too. OK. I will look at this. I went through above mail thread again where we were discussing what all information need to be passed between kernels. Last time we enumerated three things. - kernel entry/re-entry point for switch between kernels. - backup pages map for core filtering - Probably ELF core notes for saving hibernated image. I think if we just implement the functionality so that one can switch back and forth between kernels (no hibernated image saving),then we probably need to pass around only kernel entry/re-entry point and nothing else and in your patches I think you are already doing using %edi. Yes. So, IMHO, for first simple implementation, we don't have to pass around any data between kernels except entry point. (Please correct me if I am wrong). Lets get that implementation in first and then we can get rest of the pieces in place. Yes. Kernel entry/re-entry point is the only information need to be communicated between kernels for just switching between them. So we can focus on kexec jump patch firstly. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Mon, May 12, 2008 at 02:40:41PM +0800, Huang, Ying wrote: This patch implements a prototype of kexec multi-stage load. With this patch, the backup pages map can be passed to kexeced kernel via /sbin/kexec; and the sys_kexec_load can be used to load large hibernated image with huge number of segments. Hi Huang, Had a quick look at the patch. Will review in detail soon. Had few thoughts. In general, these patches are on top of previous kexec jump patches. It would be good if you could repost your updated patches so that I can apply the patches and and get some testing going. Last time I tried the patches (V9) and kexec jump did not work for me. I was not getting timer interrupts in second kernel. Then I had to put LAPIC and IOAPIC in legacy mode and then at one way jump started working. I am not sure how the next kernel boots for you without putting APICs in legacy mode. (Yet to make returning back to original kernel work using V9). In kexec based hibernation, resuming from disk is implemented as loading the hibernated disk image with sys_kexec_load(). But unlike the normal kexec load, the hibernated image may have huge number of segments. So multi-stage loading is necessary for kexec load based resuming from disk implementation. I understand that hibernated images are huge. But why do we require multi stage loading? I knew there was a maximum segment limit in kexec. But I think we can change that limit. Anything else prevents us from loading large images in one go? And, multi-stage loading is also necessary for parameter passing from original kernel to kexeced kernel because some information such as backup pages map is not available before loading. Four stages are defined: - KS_start: start stage; begin a new kexec loading; there must be only one KS_start stage in one kexec loading. - KS_mid: middle stage; continue load some segments; there may be many or zero KS_mid stages in one kexec loading; follows a KS_start or KS_mid stage. - KS_final: final stage; finish a kexec loading; there must be only one KS_final stage in one kexec loading; follows a KS_start or KS_mid stage. - KS_full: back compatible with original loading semantics, finish all work of a kexec loading in one KS_full stage. Overlapping between pages of different segments is allowed to support parameter passing. During loading, a hash table mapped from destination page to source page is used instead of original linear mapping implementation. Because the hibernated image may be very large (up to near the size of physical memory), it is very time-consuming to search a source page given the destination page, which is used to check whether an newly allocated page is in the range of allocated destination pages. This seems to be an optimization of kexec so that it becomes efficient in loading large images (containing large number of segments). Probably this can be a separate patch. IMHO, we can just first write a minimal patch where one can just switch between kernels. Once that patch is upstream, we can enhance it to do the hibernation and saving core functionality. Incremental review becomes easier. Your last patch (v9) was a good attempt at that and I thought very soon we shall have something mergable. The original mapping is only used by assembly code to swap the page contents. This map is also exported to user space via /proc/kexec_pgmap, so that /sbin/kexec can use it to construct the backup pages map parameter for kexeced kernel. This patch is based on Linux kernel 2.6.25 and kexec_jump patch, and has been tested on an IBM T42. Is kexec_jump v9 patch good enough or you have anohter internal version of patch on top of this patch applies? Thanks Vivek ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec