Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-16 Thread Pavel Machek
Hi!

  During loading, a hash table mapped from destination page to source
  page is used instead of original linear mapping
  implementation. Because the hibernated image may be very large (up to
  near the size of physical memory), it is very time-consuming to search
  a source page given the destination page, which is used to check
  whether an newly allocated page is in the range of allocated
  destination pages.
 
 This seems to be an optimization of kexec so that it becomes efficient
 in loading large images (containing large number of segments). Probably
 this can be a separate patch.
 
 IMHO, we can just first write a minimal patch where one can just switch
 between kernels. Once that patch is upstream, we can enhance

Yes, please.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-16 Thread Vivek Goyal
On Fri, May 16, 2008 at 12:52:48PM +0800, Huang, Ying wrote:
 On Thu, 2008-05-15 at 19:55 -0700, Eric W. Biederman wrote:
  Huang, Ying [EMAIL PROTECTED] writes:
  
   The disadvantage of this solution is that kernel B must know it is
   original kernel (A) or kexeced kernel (B). Different code should be used
   by kernel A and kernel B. And after jump from A to B, jump from B to A,
   when jump from A to B again, kernel A must use different code from the
   first time.
  
  I don't know what the case is for keeping two kernels in memory and 
  switching
  between them.
 
 This can be used to save the memory image of kernel B and accelerate the
 hibernation. The real boot of kernel B is only needed first time.
 
  I suspect a small piece of trampoline code between the two kernels could
  handle the case. (i.e. purgatory pays attention).
  
  That is a fundamental aspect of the design.  A general purpose 
  infrastructure
  with trampoline code to adapt it to whatever situation comes up.
 
 It is possible to use purgatory to deal with this problem.
 
 Jump from kernel A to kernel B
 Jump to entry of purgatory (purgatory_entry)
 purgatory save the return address (kexec_jump_back_entry_A)
 Purgatory set kexec_jump_back_entry for kernel B to a code
 segment in purgatory, say kexec_jump_back_entry_A_for_B
 Purgatory jump to entry point of kernel B
 Jump from kernel B to kernel A
 Jump to purgatory (kexec_jump_back_entry_A_for_B)
 Purgatory save the return address (kexec_jump_back_entry_B)
 Purgatory return to kernel A (kexec_jump_back_entry_A)
 Jump from kernel A to kernel B again
 Jump to entry of purgatory (purgatory_entry)
 Purgatory save the return address (kexec_jump_back_entry_A)
 Purgatory jump to kexec_jump_back_entry_B
 
 The disadvantage of this solution is that some information is saved in
 purgatory (kexec_jump_back_entry_A, kexec_jump_back_entry_B). So,
 purgatory must be saved too when save the memory image of kernel A or
 kernel B. Purgatory can be seen as a part of kernel B. But it is a
 little tricky to think it as a part of kernel A too.

That's a good point. Remembering the actual return points in purgatory
will require purgatory to be saved along with core file.

I think, purgatory is a good infrastructure for transitions between the
kernels but at the same time, here it is a matter of just making a call
and then inspecting the stack in kexec_jump_back_entry. IMHO, we can keep it
simple and not involving purgatory in later transitions.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-16 Thread Vivek Goyal
On Thu, May 15, 2008 at 11:27:58PM -0400, Vivek Goyal wrote:
 On Fri, May 16, 2008 at 10:56:15AM +0800, Huang, Ying wrote:
  On Thu, 2008-05-15 at 19:25 -0700, Eric W. Biederman wrote:
   Huang, Ying [EMAIL PROTECTED] writes:
   
On Thu, 2008-05-15 at 11:39 -0700, Eric W. Biederman wrote:
[...]
2) After we figure out our address read the stack pointer from
   a fixed location and simply set it.  (This is my preference)
   
Just for confirmation (My English is poor).
   
Do you mean that kernel A just read the stack top as re-entry point,
regardless of whether it is return address or argument 1?
   
   What I was thinking was:
   
   In kernel A()
   
   relocate_new_kernel:
   
   ...
   
   call  *%eax
   
   kexec_jump_back_entry:
   /* This code should be PIC so figure out where we are */
   call  1f
   1:
   popl  %edi
   subl  $(1b - relocate_kernel), %edi
   
   /* Setup a safe stack */
   lealPAGE_SIZE(%edi), %esp
   ...
   
   
   Then in purgatory we can read the address of kexec_jump_back_entry
   by examining 0(%esp) and export it in whatever fashion is sane.
   
   However we reach kexec_jump_back_entry we should be fine.
  
 
 Huang is making use of purgatory only for booting kernel B for the first
 time. Once the kernel B is booted, all the trasitions (A--B and B--A)
 happen without using purgatory. Just keep on jumping back and forth
 to kexec_jump_back_entry.
 
 Probably not using purgatory for later transitions is justified as long as
 kernel code is simple and small. Otherwise we will shall have to teach
 purgatory also of special case of resuming kernel B or booting kernel B.
 
  I think it is reasonable to enable jumping back and forth more than one
  time. So the following should be possible:
  
  1. Jump from A to B (actually jump to purgatory, trigger the boot of B)
  2. Jump from B to A
  3. Jump from A to B again (jump to the kexec_jump_back_entry of B)
  4. Jump from B to A
  ...
  
  So it should be possible to get the re-entry point of kernel B in
  kexec_jump_back_entry of kernel A too. So I think in
  kexec_jump_back_entry, the caller's stack should be checked to get
  re-entry point of peer. And the stack state is different depend on where
  come from, from relocate_new_kernel() or return.
  
 
 To me this idea also looks good. So control flow will look something
 as follows?
 
 relocate_new kernel:
   
   if (!preserve_context)
   set registers to known state.
   jump to purgatory.
   else
   goto jump-back-setup:
 
 jump-back-setup:
 - Color the stack.
   move $0x 0(%esp)
 
 - call %edx
 

Thinking more about it, probably we don't have to separate out preserve
context and normal kexec path. Both can transition to purgatory using
call %edx. Coloring the stack should not harm in normal kexec.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-15 Thread Vivek Goyal
On Thu, May 15, 2008 at 12:57:53PM +0800, Huang, Ying wrote:
 On Wed, 2008-05-14 at 14:43 -0700, Eric W. Biederman wrote:
 [...]
  Then as a preliminary design let's plan on this.
  
  - Pass the rentry point as the return address (using the C ABI).
We may want to load the stack pointer etc so we can act as
a direct entry point for new code.
 
 There are some issues about passing entry point as return address. The
 kexec jump (or kexec with return) is used for
 
 - Switching between original kernel (A) and kexeced kernel (B)
 - Call some code (such as BIOS code) in physical mode
 
 1) When call some code in physical mode, the called code can use a
 simple return to return to kernel A. So there is no return address on
 stack after return to kernel A. Instead, argument 1 is on stack top.
 
 2) When switch back from kernel B to kernel A, kernel B will call the
 jump back entry of kernel A with C ABI. So, the return address is on
 stack top. And kernel A get jump back entry of kernel B via the return
 address.
 
 Because the stack state is different between 1) and 2), the jump back
 entry of kernel A should distinguish them. Possible solution can be as
 follow:
 
 a) Before kernel A call some physical mode code or kernel B, it set
 argument 1 to be a magic number that can not be return address (such as
 -1). Jump back entry of kernel A can check whether the stack top is
 argument 1 or return address.
 
 b) Distinguish by return address. Such as, called physical mode code
 must return 0, while kernel B must set %eax to some other number.
 

IMHO, this kind of make more sense to me when keeping C function like
semantics in mind.

Both the cases can be treated like calls to functions (calling BIOS function
and jumping to kernel B). The basic difference between two cases is the
re-entry point. In BIOS function case, we always re-enter the function at the
start but in case of kernel B, except first entry, all other entries happen
at a run time determined address, which needs to be communicated to kernel A.

I would think that second kernel B just should execute ret and new entry
address of kernel B is passed to kernel A through %eax (return value of
function).

Not sure if BIOS routines can always return a fix code so that we can
differentiate between two cases.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-15 Thread Huang, Ying
On Thu, 2008-05-15 at 22:00 -0400, Vivek Goyal wrote:
[...]
 IMHO, this kind of make more sense to me when keeping C function like
 semantics in mind.
 
 Both the cases can be treated like calls to functions (calling BIOS function
 and jumping to kernel B). The basic difference between two cases is the
 re-entry point. In BIOS function case, we always re-enter the function at the
 start but in case of kernel B, except first entry, all other entries happen
 at a run time determined address, which needs to be communicated to kernel A.
 
 I would think that second kernel B just should execute ret and new entry
 address of kernel B is passed to kernel A through %eax (return value of
 function).

The disadvantage of this solution is that kernel B must know it is
original kernel (A) or kexeced kernel (B). Different code should be used
by kernel A and kernel B. And after jump from A to B, jump from B to A,
when jump from A to B again, kernel A must use different code from the
first time.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-15 Thread Eric W. Biederman
Huang, Ying [EMAIL PROTECTED] writes:

 On Thu, 2008-05-15 at 11:39 -0700, Eric W. Biederman wrote:
 [...]
 2) After we figure out our address read the stack pointer from
a fixed location and simply set it.  (This is my preference)

 Just for confirmation (My English is poor).

 Do you mean that kernel A just read the stack top as re-entry point,
 regardless of whether it is return address or argument 1?

What I was thinking was:

In kernel A()

relocate_new_kernel:

...

call*%eax

kexec_jump_back_entry:
/* This code should be PIC so figure out where we are */
call1f
1:
popl%edi
subl$(1b - relocate_kernel), %edi

/* Setup a safe stack */
lealPAGE_SIZE(%edi), %esp
...


Then in purgatory we can read the address of kexec_jump_back_entry
by examining 0(%esp) and export it in whatever fashion is sane.

However we reach kexec_jump_back_entry we should be fine.

Eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-15 Thread Huang, Ying
On Thu, 2008-05-15 at 19:25 -0700, Eric W. Biederman wrote:
 Huang, Ying [EMAIL PROTECTED] writes:
 
  On Thu, 2008-05-15 at 11:39 -0700, Eric W. Biederman wrote:
  [...]
  2) After we figure out our address read the stack pointer from
 a fixed location and simply set it.  (This is my preference)
 
  Just for confirmation (My English is poor).
 
  Do you mean that kernel A just read the stack top as re-entry point,
  regardless of whether it is return address or argument 1?
 
 What I was thinking was:
 
 In kernel A()
 
 relocate_new_kernel:
 
 ...
 
 call  *%eax
 
 kexec_jump_back_entry:
 /* This code should be PIC so figure out where we are */
 call  1f
 1:
 popl  %edi
 subl  $(1b - relocate_kernel), %edi
 
 /* Setup a safe stack */
 lealPAGE_SIZE(%edi), %esp
 ...
 
 
 Then in purgatory we can read the address of kexec_jump_back_entry
 by examining 0(%esp) and export it in whatever fashion is sane.
 
 However we reach kexec_jump_back_entry we should be fine.

I think it is reasonable to enable jumping back and forth more than one
time. So the following should be possible:

1. Jump from A to B (actually jump to purgatory, trigger the boot of B)
2. Jump from B to A
3. Jump from A to B again (jump to the kexec_jump_back_entry of B)
4. Jump from B to A
...

So it should be possible to get the re-entry point of kernel B in
kexec_jump_back_entry of kernel A too. So I think in
kexec_jump_back_entry, the caller's stack should be checked to get
re-entry point of peer. And the stack state is different depend on where
come from, from relocate_new_kernel() or return.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-15 Thread Eric W. Biederman
Huang, Ying [EMAIL PROTECTED] writes:

 The disadvantage of this solution is that kernel B must know it is
 original kernel (A) or kexeced kernel (B). Different code should be used
 by kernel A and kernel B. And after jump from A to B, jump from B to A,
 when jump from A to B again, kernel A must use different code from the
 first time.

I don't know what the case is for keeping two kernels in memory and switching
between them.

I suspect a small piece of trampoline code between the two kernels could
handle the case. (i.e. purgatory pays attention).

That is a fundamental aspect of the design.  A general purpose infrastructure
with trampoline code to adapt it to whatever situation comes up.

Eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-15 Thread Vivek Goyal
On Fri, May 16, 2008 at 10:56:15AM +0800, Huang, Ying wrote:
 On Thu, 2008-05-15 at 19:25 -0700, Eric W. Biederman wrote:
  Huang, Ying [EMAIL PROTECTED] writes:
  
   On Thu, 2008-05-15 at 11:39 -0700, Eric W. Biederman wrote:
   [...]
   2) After we figure out our address read the stack pointer from
  a fixed location and simply set it.  (This is my preference)
  
   Just for confirmation (My English is poor).
  
   Do you mean that kernel A just read the stack top as re-entry point,
   regardless of whether it is return address or argument 1?
  
  What I was thinking was:
  
  In kernel A()
  
  relocate_new_kernel:
  
  ...
  
  call*%eax
  
  kexec_jump_back_entry:
  /* This code should be PIC so figure out where we are */
  call1f
  1:
  popl%edi
  subl$(1b - relocate_kernel), %edi
  
  /* Setup a safe stack */
  lealPAGE_SIZE(%edi), %esp
  ...
  
  
  Then in purgatory we can read the address of kexec_jump_back_entry
  by examining 0(%esp) and export it in whatever fashion is sane.
  
  However we reach kexec_jump_back_entry we should be fine.
 

Huang is making use of purgatory only for booting kernel B for the first
time. Once the kernel B is booted, all the trasitions (A--B and B--A)
happen without using purgatory. Just keep on jumping back and forth
to kexec_jump_back_entry.

Probably not using purgatory for later transitions is justified as long as
kernel code is simple and small. Otherwise we will shall have to teach
purgatory also of special case of resuming kernel B or booting kernel B.

 I think it is reasonable to enable jumping back and forth more than one
 time. So the following should be possible:
 
 1. Jump from A to B (actually jump to purgatory, trigger the boot of B)
 2. Jump from B to A
 3. Jump from A to B again (jump to the kexec_jump_back_entry of B)
 4. Jump from B to A
 ...
 
 So it should be possible to get the re-entry point of kernel B in
 kexec_jump_back_entry of kernel A too. So I think in
 kexec_jump_back_entry, the caller's stack should be checked to get
 re-entry point of peer. And the stack state is different depend on where
 come from, from relocate_new_kernel() or return.
 

To me this idea also looks good. So control flow will look something
as follows?

relocate_new kernel:

if (!preserve_context)
set registers to known state.
jump to purgatory.
else
goto jump-back-setup:

jump-back-setup:
- Color the stack.
  move $0x 0(%esp)

- call %edx

kexec_jump_back_entry:

- If 0 (%esp) is not -1
image-start = 0(%esp)  //Re entry point of kernel B. Store it.
  else
We returned from BIOS call. Re-entry point has not changed
Do nothing.

- Continue to resume kernel A

Thanks
Vivek
 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-15 Thread Eric W. Biederman
Huang, Ying [EMAIL PROTECTED] writes:


 I think it is reasonable to enable jumping back and forth more than one
 time.

I'm not opposed.  I just don't understand the utility yet.

 So the following should be possible:

 1. Jump from A to B (actually jump to purgatory, trigger the boot of B)
 2. Jump from B to A
 3. Jump from A to B again (jump to the kexec_jump_back_entry of B)
  (And we go through purgatory which remembers
   the kexec_jump_back_entry of B)
 4. Jump from B to A
 ...

 So it should be possible to get the re-entry point of kernel B in
 kexec_jump_back_entry of kernel A too. So I think in
 kexec_jump_back_entry, the caller's stack should be checked to get
 re-entry point of peer. And the stack state is different depend on where
 come from, from relocate_new_kernel() or return.

Yes.

Any conditional logic needs to be in purgatory or a similar trampoline.

Eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-14 Thread Eric W. Biederman
Huang, Ying [EMAIL PROTECTED] writes:

 So, IMHO, for first simple implementation, we don't have to pass around
 any data between kernels except entry point. (Please correct me if I am 
 wrong). Lets get that implementation in first and then we can get rest
 of the pieces in place.

 Yes. Kernel entry/re-entry point is the only information need to be
 communicated between kernels for just switching between them. So we can
 focus on kexec jump patch firstly.

Then as a preliminary design let's plan on this.

- Pass the rentry point as the return address (using the C ABI).
  We may want to load the stack pointer etc so we can act as
  a direct entry point for new code.

- Look at passing a pointer to the mapping of pages that the kexec
  trampoline uses in arg1 of the C ABI.  Largely the format is defacto
  fixed anyway because we need to pass the structure from C to
  assembly.

Using the standard C ABI makes things much it much easier to pick
a calling convention, and to document it.

Eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-14 Thread Huang, Ying
On Wed, 2008-05-14 at 14:43 -0700, Eric W. Biederman wrote:
[...]
 Then as a preliminary design let's plan on this.
 
 - Pass the rentry point as the return address (using the C ABI).
   We may want to load the stack pointer etc so we can act as
   a direct entry point for new code.

There are some issues about passing entry point as return address. The
kexec jump (or kexec with return) is used for

- Switching between original kernel (A) and kexeced kernel (B)
- Call some code (such as BIOS code) in physical mode

1) When call some code in physical mode, the called code can use a
simple return to return to kernel A. So there is no return address on
stack after return to kernel A. Instead, argument 1 is on stack top.

2) When switch back from kernel B to kernel A, kernel B will call the
jump back entry of kernel A with C ABI. So, the return address is on
stack top. And kernel A get jump back entry of kernel B via the return
address.

Because the stack state is different between 1) and 2), the jump back
entry of kernel A should distinguish them. Possible solution can be as
follow:

a) Before kernel A call some physical mode code or kernel B, it set
argument 1 to be a magic number that can not be return address (such as
-1). Jump back entry of kernel A can check whether the stack top is
argument 1 or return address.

b) Distinguish by return address. Such as, called physical mode code
must return 0, while kernel B must set %eax to some other number.

c) Use different entry point for 1) and 2). Two entry points are deduced
from return address. Such as:

entry1 = return_address;
entry2 = return_address  ~0xfff;   /* page aligned */

entry1 is used by physical mode code. entry2 is used by kernel B.


Which one is better? Or some other solution?

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-13 Thread Huang, Ying
Hi, Vivek,

On Tue, 2008-05-13 at 01:34 -0400, Vivek Goyal wrote:
 On Mon, May 12, 2008 at 02:40:41PM +0800, Huang, Ying wrote:
  This patch implements a prototype of kexec multi-stage load. With this
  patch, the backup pages map can be passed to kexeced kernel via
  /sbin/kexec; and the sys_kexec_load can be used to load large
  hibernated image with huge number of segments.
  
  
 
 Hi Huang,
 
 Had a quick look at the patch. Will review in detail soon. Had few
 thoughts.
 
 In general, these patches are on top of previous kexec jump patches.
 It would be good if you could repost your updated patches so that
 I can apply the patches and and get some testing going.

The kexec jump patch v9 is sufficient for this patch to work. I have no
new version of kexec jump patch so far.

 Last time I tried the patches (V9) and kexec jump did not work for me. I
 was not getting timer interrupts in second kernel. Then I had to put 
 LAPIC and IOAPIC in legacy mode and then at one way jump started working.
 I am not sure how the next kernel boots for you without putting APICs
 in legacy mode. (Yet to make returning back to original kernel work
 using V9). 

Can normal kexec (without kexec jump) works without putting LAPIC and
IOAPIC in legacy mode? Does this mean we should put LAPIC and IOAPIC
into legacy mode before kexec and restore them after?

The kexec jump patch works well on my IBM T42. But it seems that the
IOAPIC is disabled in BIOS, so I can only use i8259 and LAPIC on this
machine.

  In kexec based hibernation, resuming from disk is implemented as
  loading the hibernated disk image with sys_kexec_load(). But unlike
  the normal kexec load, the hibernated image may have huge number of
  segments. So multi-stage loading is necessary for kexec load based
  resuming from disk implementation.
 
 I understand that hibernated images are huge. But why do we require
 multi stage loading? I knew there was a maximum segment limit in kexec.
 But I think we can change that limit. Anything else prevents us from
 loading large images in one go?

There are two reason for multi-stage loading:

- Pass backup pages map from original kernel (A) to kexeced kernel (B),
because it is not known before loading. We have discussed this before
in:
http://lkml.org/lkml/2008/3/12/308
http://lkml.org/lkml/2008/3/14/59
http://lkml.org/lkml/2008/3/21/299

- Load large hibernated image. The hibernated image can be not only
large but also discontinuous. For example, the physical memory size is
4G, and there is one free page every 2 pages, that is, there will be
nearly 2G segments. Loading these segments in one go is impossible. So
multi-stage load is necessary. And if the hibernated image is
compressed, it is also very difficult to load it in one go because the
anonymous pages needed.

  And, multi-stage loading is also
  necessary for parameter passing from original kernel to kexeced kernel
  because some information such as backup pages map is not available
  before loading.
  
  
  Four stages are defined:
  
  - KS_start: start stage; begin a new kexec loading; there must be only
one KS_start stage in one kexec loading.
  
  - KS_mid: middle stage; continue load some segments; there may be many
or zero KS_mid stages in one kexec loading; follows a KS_start or
KS_mid stage.
  
  - KS_final: final stage; finish a kexec loading; there must be only
one KS_final stage in one kexec loading; follows a KS_start or
KS_mid stage.
  
  - KS_full: back compatible with original loading semantics, finish all
work of a kexec loading in one KS_full stage.
  
  
  Overlapping between pages of different segments is allowed to support
  parameter passing.
  
  
  During loading, a hash table mapped from destination page to source
  page is used instead of original linear mapping
  implementation. Because the hibernated image may be very large (up to
  near the size of physical memory), it is very time-consuming to search
  a source page given the destination page, which is used to check
  whether an newly allocated page is in the range of allocated
  destination pages.
 
 This seems to be an optimization of kexec so that it becomes efficient
 in loading large images (containing large number of segments). Probably
 this can be a separate patch.

If it is desired, I can separate it into another patch.

 IMHO, we can just first write a minimal patch where one can just switch
 between kernels. Once that patch is upstream, we can enhance
 it to do the hibernation and saving core functionality. Incremental
 review becomes easier. Your last patch (v9) was a good attempt at that and
 I thought very soon we shall have something mergable.

Agreed. We can first focus on kexec jump patch. But as in last thread of
kexec jump (v9), we need a protocol for parameter passing between kernel
A and kernel B. So, we can use this patch as a prototype for the
communication protocol.

  The original mapping is only used by 

Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-13 Thread Vivek Goyal
On Wed, May 14, 2008 at 09:57:46AM +0800, Huang, Ying wrote:
 Hi, Vivek,
 
 On Tue, 2008-05-13 at 01:34 -0400, Vivek Goyal wrote:
  On Mon, May 12, 2008 at 02:40:41PM +0800, Huang, Ying wrote:
   This patch implements a prototype of kexec multi-stage load. With this
   patch, the backup pages map can be passed to kexeced kernel via
   /sbin/kexec; and the sys_kexec_load can be used to load large
   hibernated image with huge number of segments.
   
   
  
  Hi Huang,
  
  Had a quick look at the patch. Will review in detail soon. Had few
  thoughts.
  
  In general, these patches are on top of previous kexec jump patches.
  It would be good if you could repost your updated patches so that
  I can apply the patches and and get some testing going.
 
 The kexec jump patch v9 is sufficient for this patch to work. I have no
 new version of kexec jump patch so far.
 
  Last time I tried the patches (V9) and kexec jump did not work for me. I
  was not getting timer interrupts in second kernel. Then I had to put 
  LAPIC and IOAPIC in legacy mode and then at one way jump started working.
  I am not sure how the next kernel boots for you without putting APICs
  in legacy mode. (Yet to make returning back to original kernel work
  using V9). 
 
 Can normal kexec (without kexec jump) works without putting LAPIC and
 IOAPIC in legacy mode? Does this mean we should put LAPIC and IOAPIC
 into legacy mode before kexec and restore them after?
 

We do put LAPIC and IOAPIC in legacy mode in normal kexec. Look at 
disable_IO_APIC() in native_machine_shutdown(). So I think we shall
have to do the same thing in kexec jump code too.

 The kexec jump patch works well on my IBM T42. But it seems that the
 IOAPIC is disabled in BIOS, so I can only use i8259 and LAPIC on this
 machine.
 
   In kexec based hibernation, resuming from disk is implemented as
   loading the hibernated disk image with sys_kexec_load(). But unlike
   the normal kexec load, the hibernated image may have huge number of
   segments. So multi-stage loading is necessary for kexec load based
   resuming from disk implementation.
  
  I understand that hibernated images are huge. But why do we require
  multi stage loading? I knew there was a maximum segment limit in kexec.
  But I think we can change that limit. Anything else prevents us from
  loading large images in one go?
 
 There are two reason for multi-stage loading:
 
 - Pass backup pages map from original kernel (A) to kexeced kernel (B),
 because it is not known before loading. We have discussed this before
 in:
   http://lkml.org/lkml/2008/3/12/308
   http://lkml.org/lkml/2008/3/14/59
   http://lkml.org/lkml/2008/3/21/299
 

See my response below

 - Load large hibernated image. The hibernated image can be not only
 large but also discontinuous. For example, the physical memory size is
 4G, and there is one free page every 2 pages, that is, there will be
 nearly 2G segments. Loading these segments in one go is impossible. So
 multi-stage load is necessary. And if the hibernated image is
 compressed, it is also very difficult to load it in one go because the
 anonymous pages needed.
 
   And, multi-stage loading is also
   necessary for parameter passing from original kernel to kexeced kernel
   because some information such as backup pages map is not available
   before loading.
   
   
   Four stages are defined:
   
   - KS_start: start stage; begin a new kexec loading; there must be only
 one KS_start stage in one kexec loading.
   
   - KS_mid: middle stage; continue load some segments; there may be many
 or zero KS_mid stages in one kexec loading; follows a KS_start or
 KS_mid stage.
   
   - KS_final: final stage; finish a kexec loading; there must be only
 one KS_final stage in one kexec loading; follows a KS_start or
 KS_mid stage.
   
   - KS_full: back compatible with original loading semantics, finish all
 work of a kexec loading in one KS_full stage.
   
   
   Overlapping between pages of different segments is allowed to support
   parameter passing.
   
   
   During loading, a hash table mapped from destination page to source
   page is used instead of original linear mapping
   implementation. Because the hibernated image may be very large (up to
   near the size of physical memory), it is very time-consuming to search
   a source page given the destination page, which is used to check
   whether an newly allocated page is in the range of allocated
   destination pages.
  
  This seems to be an optimization of kexec so that it becomes efficient
  in loading large images (containing large number of segments). Probably
  this can be a separate patch.
 
 If it is desired, I can separate it into another patch.
 
  IMHO, we can just first write a minimal patch where one can just switch
  between kernels. Once that patch is upstream, we can enhance
  it to do the hibernation and saving core functionality. Incremental
  review becomes easier. Your 

Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-13 Thread Huang, Ying
On Tue, 2008-05-13 at 22:56 -0400, Vivek Goyal wrote:
[...]
  
   Last time I tried the patches (V9) and kexec jump did not work for me. I
   was not getting timer interrupts in second kernel. Then I had to put 
   LAPIC and IOAPIC in legacy mode and then at one way jump started working.
   I am not sure how the next kernel boots for you without putting APICs
   in legacy mode. (Yet to make returning back to original kernel work
   using V9). 
  
  Can normal kexec (without kexec jump) works without putting LAPIC and
  IOAPIC in legacy mode? Does this mean we should put LAPIC and IOAPIC
  into legacy mode before kexec and restore them after?
  
 
 We do put LAPIC and IOAPIC in legacy mode in normal kexec. Look at 
 disable_IO_APIC() in native_machine_shutdown(). So I think we shall
 have to do the same thing in kexec jump code too.

OK. I will look at this.

 I went through above mail thread again where we were discussing what all
 information need to be passed between kernels.
 
 Last time we enumerated three things.
 
 - kernel entry/re-entry point for switch between kernels.
 - backup pages map for core filtering
 - Probably ELF core notes for saving hibernated image.
 
 I think if we just implement the functionality so that one can switch
 back and forth between kernels (no hibernated image saving),then we probably
 need to pass around only kernel entry/re-entry point and nothing else and in
 your patches I think you are already doing using %edi.

Yes.

 So, IMHO, for first simple implementation, we don't have to pass around
 any data between kernels except entry point. (Please correct me if I am 
 wrong). Lets get that implementation in first and then we can get rest
 of the pieces in place.

Yes. Kernel entry/re-entry point is the only information need to be
communicated between kernels for just switching between them. So we can
focus on kexec jump patch firstly.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-12 Thread Vivek Goyal
On Mon, May 12, 2008 at 02:40:41PM +0800, Huang, Ying wrote:
 This patch implements a prototype of kexec multi-stage load. With this
 patch, the backup pages map can be passed to kexeced kernel via
 /sbin/kexec; and the sys_kexec_load can be used to load large
 hibernated image with huge number of segments.
 
 

Hi Huang,

Had a quick look at the patch. Will review in detail soon. Had few
thoughts.

In general, these patches are on top of previous kexec jump patches.
It would be good if you could repost your updated patches so that
I can apply the patches and and get some testing going.

Last time I tried the patches (V9) and kexec jump did not work for me. I
was not getting timer interrupts in second kernel. Then I had to put 
LAPIC and IOAPIC in legacy mode and then at one way jump started working.
I am not sure how the next kernel boots for you without putting APICs
in legacy mode. (Yet to make returning back to original kernel work
using V9). 

 In kexec based hibernation, resuming from disk is implemented as
 loading the hibernated disk image with sys_kexec_load(). But unlike
 the normal kexec load, the hibernated image may have huge number of
 segments. So multi-stage loading is necessary for kexec load based
 resuming from disk implementation.

I understand that hibernated images are huge. But why do we require
multi stage loading? I knew there was a maximum segment limit in kexec.
But I think we can change that limit. Anything else prevents us from
loading large images in one go?

 And, multi-stage loading is also
 necessary for parameter passing from original kernel to kexeced kernel
 because some information such as backup pages map is not available
 before loading.
 
 
 Four stages are defined:
 
 - KS_start: start stage; begin a new kexec loading; there must be only
   one KS_start stage in one kexec loading.
 
 - KS_mid: middle stage; continue load some segments; there may be many
   or zero KS_mid stages in one kexec loading; follows a KS_start or
   KS_mid stage.
 
 - KS_final: final stage; finish a kexec loading; there must be only
   one KS_final stage in one kexec loading; follows a KS_start or
   KS_mid stage.
 
 - KS_full: back compatible with original loading semantics, finish all
   work of a kexec loading in one KS_full stage.
 
 
 Overlapping between pages of different segments is allowed to support
 parameter passing.
 
 
 During loading, a hash table mapped from destination page to source
 page is used instead of original linear mapping
 implementation. Because the hibernated image may be very large (up to
 near the size of physical memory), it is very time-consuming to search
 a source page given the destination page, which is used to check
 whether an newly allocated page is in the range of allocated
 destination pages.

This seems to be an optimization of kexec so that it becomes efficient
in loading large images (containing large number of segments). Probably
this can be a separate patch.

IMHO, we can just first write a minimal patch where one can just switch
between kernels. Once that patch is upstream, we can enhance
it to do the hibernation and saving core functionality. Incremental
review becomes easier. Your last patch (v9) was a good attempt at that and
I thought very soon we shall have something mergable.

 The original mapping is only used by assembly code
 to swap the page contents. This map is also exported to user space via
 /proc/kexec_pgmap, so that /sbin/kexec can use it to construct the
 backup pages map parameter for kexeced kernel.
 
 
 This patch is based on Linux kernel 2.6.25 and kexec_jump patch, and
 has been tested on an IBM T42.
 

Is kexec_jump v9 patch good enough or you have anohter internal version
of patch on top of this patch applies?

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec