from:"Vivek Goyal"

Re: [PATCH v3 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation

2021-02-17 Thread Vivek Goyal

On Wed, Feb 17, 2021 at 02:26:53PM -0500, Steven Rostedt wrote:
> On Wed, 17 Feb 2021 12:40:43 -0600
> john.p.donne...@oracle.com wrote:
> 
> > Hello.
> > 
> > Ping.
> > 
> > Can we get this reviewed and staged ?
> > 
> > Thank you.
> 
> Andrew,
> 
> Seems you are the only one pushing patches in for kexec/crash. Is this
> maintained by anyone?

Dave Young and Baoquan He still maintain kexec/kdump stuff, AFAIK. I
don't get time to look into this stuff now a days. 

Vivek


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-20 Thread Vivek Goyal

On Wed, Jul 20, 2016 at 09:35:30AM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 20, 2016 at 01:45:42PM +1000, Balbir Singh wrote:
> > > IOW, if your kernel forced signature verification, you should not be
> > > able to do sig_enforce=0. If you kernel did not have
> > > CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway
> > > and you are not making it worse using command line.
> > 
> > OK.. I checked and you are right, but that is an example and there are
> > other things like security=, thermal.*, nosmep, nosmap that need auditing
> > for safety and might hurt the system security if used. I still think
> > think that assuming you can pass any command line without breaking security
> > is a broken argument.
> 
> Quite, and you don't need to run code in a privileged environment to do
> any of that.
> 
> It's also not trivial to protect against: new kernels gain new arguments
> which older kernels may not know about.  No matter how much protection
> is built into older kernels, newer kernels can become vulnerable through
> the addition of further arguments.

If a new kernel command line option becomes an issue, new kernel can
block that in secureboot environment. That way it helps kexec
boot as well as regular boot.

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-20 Thread Vivek Goyal

On Wed, Jul 20, 2016 at 01:45:42PM +1000, Balbir Singh wrote:
> >  
> > Command line options are not signed. I thought idea behind secureboot
> > was to execute only trusted code and command line options don't enforce
> > you to execute unsigned code.
> >  
> >>
> >> You can set module.sig_enforce=0 and open up the system a bit assuming
> >> that you can get a module to load with another attack
> > 
> > IIUC, sig_enforce bool_enable_only so it can only be enabled. Default
> > value of it is 0 if CONFIG_MODULE_SIG_FORCE=n.
> > 
> > IOW, if your kernel forced signature verification, you should not be
> > able to do sig_enforce=0. If you kernel did not have
> > CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway
> > and you are not making it worse using command line.
> > 
> 
> OK.. I checked and you are right, but that is an example and there are
> other things like security=, thermal.*, nosmep, nosmap that need auditing
> for safety and might hurt the system security if used. I still think
> think that assuming you can pass any command line without breaking security
> is a broken argument.

I agree that if some command line option allows running unsigned code
at ring 0, then we probably should disable that on secureboot enabled
boot.

In fact, there were bunch of patches which made things tighter on
secureboot enabled machines from matthew garrett. AFAIK, these patches
never went upstream.

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 3/3] kexec: extend kexec_file_load system call

2016-07-19 Thread Vivek Goyal

On Tue, Jul 19, 2016 at 01:47:28PM +0100, Mark Rutland wrote:
> On Tue, Jul 19, 2016 at 08:24:06AM -0400, Vivek Goyal wrote:
> > On Tue, Jul 19, 2016 at 11:52:00AM +0100, Mark Rutland wrote:
> > > Regardless, this extended syscall changes some underlying assumptions
> > > made with the development of kexec_file_load, and I think treating this
> > > as an extension is not a great idea. From a user's perspective there is
> > > little difference between passing an additional flag or using a
> > > different syscall number, so I don't think that we gain much by altering
> > > the existing prototype relative to allocating a new syscall number.
> > 
> > If we are providing/opening up additional flags, I can't think what will
> > it break. Same flag was invalid in old kernel but new kernel supports 
> > it and will accept it. So it sounds reasonable to me to add new flags.
> > 
> > If existing users are not broken, then I think it might be a good idea
> > to extend existing syscall. Otherwise userspace will have to be modified
> > to understand a 3rd syscall also and an additional option will show up
> > which asks users to specify which syscall to use. So extending existing
> > syscall might keep it little simple for users.
> 
> I don't follow.
> 
> To use the new feature, you have to modify userspace anyway, as you
> require userspace to pass information which it did not previously pass
> (in the new arguments added to the syscall).
> 
> The presence of a new syscall does not imply the absence of the old
> syscall, so you can always use that be default unless the user asks for
> asomething only the new syscall provides. Regardless of the
> syscall/flags difference, you still have to detect whether the new
> functionality is present somehow.
> 

Hmm., so current idea is that we have two syscalls() which are *ideally*
supposed to work for all arches. Difference between two is that first
one does not support kernel signature verification while second one does.

By default old syscall is used and user can force using new syscall using
option --kexec-file-load.

If a user DTB is present, I was hoping that it will continue to work the
same way. Both the sycalls can be used and can handle DTB. If we introduce
a 3rd syscall, that means only first and 3rd syscall can handle DTB and
we need to introduce one more option which tells whether to use
kexec_load() or use the 3rd new syscall. And that's what I am trying
to avoid.

Vivek

> > BTW, does kexec_load() needs to be modified too to handle DT?
> 
> No, at least for arm64. In the kexec_load case userspace provides the
> DTB as a raw segment, and the user-provided purgatory sets up registers
> to pass that to the new kernel.
> 
> Thanks,
> Mark.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 3/3] kexec: extend kexec_file_load system call

2016-07-19 Thread Vivek Goyal

On Tue, Jul 19, 2016 at 11:52:00AM +0100, Mark Rutland wrote:
> On Tue, Jul 19, 2016 at 08:55:56AM +0800, Dave Young wrote:
> > On 07/18/16 at 11:07am, Mark Rutland wrote:
> > > On Mon, Jul 18, 2016 at 10:30:24AM +0800, Dave Young wrote:
> > > > I do not think it is worth to add another syscall for extra fds.
> > > > We have open(2) as an example for different numbers of arguments
> > > > already.
> > > 
> > > Did we change the syscall interface for that?
> > > 
> > > I was under the impression that there was always one underlying syscall,
> > > and the C library did the right thing to pass the expected information
> > > to the underlying syscall.
> > 
> > I'm not sure kexec_load and kexec_file_load were included in glibc, we use
> > syscall directly in kexec-tools.
> > 
> > kexec_load man pages says there are no wrappers for both kexec_load and
> > kexec_file_load in glibc.
> 
> For the above, I was talking about how open() was handled.
> 
> If there are no userspace wrappers, then the two cases aren't comparable
> in the first place...
> 
> > > That's rather different to changing the underlying syscall.
> > > 
> > > Regardless of how this is wrapped in userspace, I do not think modifying
> > > the existing prototype is a good idea, and I think this kind of
> > > extension needs to be a new syscall.
> > 
> > Hmm, as I replied to Vivek, there is one case about the flags, previously
> > the new flag will be regarded as invalid, but not we extend it it will be
> > valid, this maybe the only potential bad case.
> 
> It's true that adding suport for new flags will change the behaviour of
> what used to be error cases. We generally expect real users to not be
> making pointless calls for which they rely on an error being returned in
> all cases.
> 
> Regardless, this extended syscall changes some underlying assumptions
> made with the development of kexec_file_load, and I think treating this
> as an extension is not a great idea. From a user's perspective there is
> little difference between passing an additional flag or using a
> different syscall number, so I don't think that we gain much by altering
> the existing prototype relative to allocating a new syscall number.

If we are providing/opening up additional flags, I can't think what will
it break. Same flag was invalid in old kernel but new kernel supports 
it and will accept it. So it sounds reasonable to me to add new flags.

If existing users are not broken, then I think it might be a good idea
to extend existing syscall. Otherwise userspace will have to be modified
to understand a 3rd syscall also and an additional option will show up
which asks users to specify which syscall to use. So extending existing
syscall might keep it little simple for users.

This is only if conclusion in the end is that DT needs to be passed in
from user space.

BTW, does kexec_load() needs to be modified too to handle DT?

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-18 Thread Vivek Goyal

On Mon, Jul 18, 2016 at 09:26:29AM -0400, Vivek Goyal wrote:
> On Mon, Jul 18, 2016 at 10:46:04PM +1000, Balbir Singh wrote:
> > On Wed, 2016-07-13 at 14:22 -0400, Vivek Goyal wrote:
> > > On Wed, Jul 13, 2016 at 06:40:10PM +0100, Russell King - ARM Linux wrote:
> > > > 
> > > > On Wed, Jul 13, 2016 at 09:03:38AM -0400, Vivek Goyal wrote:
> > > > > 
> > > > > On Wed, Jul 13, 2016 at 09:26:39AM +0100, Russell King - ARM Linux 
> > > > > wrote:
> > > > > > 
> > > > > > Indeed - maybe Eric knows better, but I can't see any situation 
> > > > > > where
> > > > > > the dtb we load via kexec should ever affect "the bootloader", 
> > > > > > unless
> > > > > > the "kernel" that's being loaded into kexec is "the bootloader".
> > > > > > 
> > > > > > Now, going back to the more fundamental issue raised in my first 
> > > > > > reply,
> > > > > > about the kernel command line.
> > > > > > 
> > > > > > On x86, I can see that it _is_ possible for userspace to specify a
> > > > > > command line, and the kernel loading the image provides the command
> > > > > > line to the to-be-kexeced kernel with very little checking.  So, if
> > > > > > your kernel is signed, what stops the "insecure userspace" loading
> > > > > > a signed kernel but giving it an insecure rootfs and/or console?
> > > > > It is not kexec specific. I could do this for regular boot too, right?
> > > > > 
> > > > > Command line options are not signed. I thought idea behind secureboot
> > > > > was to execute only trusted code and command line options don't 
> > > > > enforce
> > > > > you to execute unsigned code.
> > > > > 
> > 
> > You can set module.sig_enforce=0 and open up the system a bit assuming
> > that you can get a module to load with another attack
> 
> IIUC, sig_enforce bool_enable_only so it can only be enabled. Default
> value of it is 0 if CONFIG_MODULE_SIG_FORCE=n.
> 
> IOW, if your kernel forced signature verification, you should not be
> able to do sig_enforce=0. If you kernel did not have
> CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway
> and you are not making it worse using command line.

[ CC Matthew Garrett ]

I think on top of this there were patches by Matthew Garrett, which
disallowed loading of unsigned modules if booted with secureboot on. I
think those patches never made upstream though.

Vivek

> 
> > 
> > > > > So it sounds like different class of security problems which you are
> > > > > referring to and not necessarily covered by secureboot or signed
> > > > > kernel.
> > > > Let me give you an example.
> > > > 
> > > > You have a secure boot setup, where the firmware/ROM validates the boot
> > > > loader.  Good, the boot loader hasn't been tampered with.
> > > > 
> > > > You interrupt the boot loader and are able to modify the command line
> > > > for the booted kernel.
> > > > 
> > > > The boot loader loads the kernel and verifies the kernel's signature.
> > > > Good, the kernel hasn't been tampered with.  The kernel starts running.
> > > > 
> > > > You've plugged in a USB drive to the device, and specified a partition
> > > > containing a root filesystem that you control to the kernel.  The
> > > > validated kernel finds the USB drive, and mounts it, and executes
> > > > your own binaries on the USB drive.
> > > You will require physical access to the machine to be able to
> > > insert your usb drive. And IIRC, argument was that if attacker has
> > > physical access to machine, all bets are off anyway.
> > >
> > 
> > You don't need physical access -- your machine controller BMC can
> > do the magic for you. So its not always physical access, is it?
> 
> Well, idea was that if you have physical access to machine, then all
> bets are off. If BMC can do something which allows running unsigned
> code at ring level 0, its a problem I think from secureboot model of
> security.
> 
> >  
> > > > 
> > > > 
> > > > You run a shell on the console.  You now have control of the system,
> > > > and can mount the real rootfs, inspect it, and work out what it does,
> > > > etc.
> > > > 
> > > > At this point, what use was all the validation that the secure boot
> > > > has done?  Absolutely useless.
> > > > 
> > > > If you can change the command line arguments given to the kernel, you
> > > > have no security, no matter how much you verify signatures.  It's
> > > > the illusion of security, nothing more, nothing less.
> > > > 
> > 
> > I agree, if you can change command line arguments, all bets are of lesser 
> > value
> 
> If changing command line allows execution of unsigned code at ring level
> 0, then it is a problem. Otherwise we are talking of security issues which
> are not covered by secureboot model.
> 
> Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-18 Thread Vivek Goyal

On Mon, Jul 18, 2016 at 10:46:04PM +1000, Balbir Singh wrote:
> On Wed, 2016-07-13 at 14:22 -0400, Vivek Goyal wrote:
> > On Wed, Jul 13, 2016 at 06:40:10PM +0100, Russell King - ARM Linux wrote:
> > > 
> > > On Wed, Jul 13, 2016 at 09:03:38AM -0400, Vivek Goyal wrote:
> > > > 
> > > > On Wed, Jul 13, 2016 at 09:26:39AM +0100, Russell King - ARM Linux 
> > > > wrote:
> > > > > 
> > > > > Indeed - maybe Eric knows better, but I can't see any situation where
> > > > > the dtb we load via kexec should ever affect "the bootloader", unless
> > > > > the "kernel" that's being loaded into kexec is "the bootloader".
> > > > > 
> > > > > Now, going back to the more fundamental issue raised in my first 
> > > > > reply,
> > > > > about the kernel command line.
> > > > > 
> > > > > On x86, I can see that it _is_ possible for userspace to specify a
> > > > > command line, and the kernel loading the image provides the command
> > > > > line to the to-be-kexeced kernel with very little checking.  So, if
> > > > > your kernel is signed, what stops the "insecure userspace" loading
> > > > > a signed kernel but giving it an insecure rootfs and/or console?
> > > > It is not kexec specific. I could do this for regular boot too, right?
> > > > 
> > > > Command line options are not signed. I thought idea behind secureboot
> > > > was to execute only trusted code and command line options don't enforce
> > > > you to execute unsigned code.
> > > > 
> 
> You can set module.sig_enforce=0 and open up the system a bit assuming
> that you can get a module to load with another attack

IIUC, sig_enforce bool_enable_only so it can only be enabled. Default
value of it is 0 if CONFIG_MODULE_SIG_FORCE=n.

IOW, if your kernel forced signature verification, you should not be
able to do sig_enforce=0. If you kernel did not have
CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway
and you are not making it worse using command line.

> 
> > > > So it sounds like different class of security problems which you are
> > > > referring to and not necessarily covered by secureboot or signed
> > > > kernel.
> > > Let me give you an example.
> > > 
> > > You have a secure boot setup, where the firmware/ROM validates the boot
> > > loader.  Good, the boot loader hasn't been tampered with.
> > > 
> > > You interrupt the boot loader and are able to modify the command line
> > > for the booted kernel.
> > > 
> > > The boot loader loads the kernel and verifies the kernel's signature.
> > > Good, the kernel hasn't been tampered with.  The kernel starts running.
> > > 
> > > You've plugged in a USB drive to the device, and specified a partition
> > > containing a root filesystem that you control to the kernel.  The
> > > validated kernel finds the USB drive, and mounts it, and executes
> > > your own binaries on the USB drive.
> > You will require physical access to the machine to be able to
> > insert your usb drive. And IIRC, argument was that if attacker has
> > physical access to machine, all bets are off anyway.
> >
> 
> You don't need physical access -- your machine controller BMC can
> do the magic for you. So its not always physical access, is it?

Well, idea was that if you have physical access to machine, then all
bets are off. If BMC can do something which allows running unsigned
code at ring level 0, its a problem I think from secureboot model of
security.

>  
> > > 
> > > 
> > > You run a shell on the console.  You now have control of the system,
> > > and can mount the real rootfs, inspect it, and work out what it does,
> > > etc.
> > > 
> > > At this point, what use was all the validation that the secure boot
> > > has done?  Absolutely useless.
> > > 
> > > If you can change the command line arguments given to the kernel, you
> > > have no security, no matter how much you verify signatures.  It's
> > > the illusion of security, nothing more, nothing less.
> > > 
> 
> I agree, if you can change command line arguments, all bets are of lesser 
> value

If changing command line allows execution of unsigned code at ring level
0, then it is a problem. Otherwise we are talking of security issues which
are not covered by secureboot model.

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec: Add option to fallback to old kexec syscall when kexec file based syscall failed

2016-07-15 Thread Vivek Goyal

On Fri, Jul 15, 2016 at 04:42:40PM +0200, Petr Tesarik wrote:
> On Fri, 15 Jul 2016 08:51:14 -0400
> Vivek Goyal <vgo...@redhat.com> wrote:
> 
> > On Fri, Jul 15, 2016 at 09:58:22AM +0200, Petr Tesarik wrote:
> > > On Fri, 15 Jul 2016 07:57:22 +0800
> > > joeyli <j...@suse.com> wrote:
> > > 
> > > > Hi Vivek
> > > > 
> > > > On Thu, Jul 14, 2016 at 10:53:28AM -0400, Vivek Goyal wrote:
> > > > > On Thu, Jul 14, 2016 at 04:45:11PM +0800, Lee, Chun-Yi wrote:
> > > > > > This patch adds a new "--fallback-kexec" option to give a chance to
> > > > > > fallback to old kexec syscall when file based kexec syscall 
> > > > > > operation
> > > > > > failed.
> > > > > 
> > > > > I think caller should switch to using different interface if need be. 
> > > > > But
> > > > > I don't see much point in providing an option for this in kexec-tools.
> > > > > 
> > > > > Vivek
> > > > >
> > > > 
> > > > OK~ Understood!
> > > > 
> > > > Thanks for Baoquan's and your opinion for this patch.
> > > 
> > > Is there some sort of diagnostics, so a calling script can determine
> > > whether kexec failed, because there's no suppor for kexec_file_load(2)
> > > or for a different reason? 
> > 
> > Will we not get -ENOSYS if kexec_file_load() is not implemented?
> 
> Sure, the kexec code will see a beautiful ENOSYS in errno, but it
> merely prints this message on stderr (possibly with a different error
> string if not linked against glibc):
> 
> kexec_file_load failed: Function not implemented
> 
> ...and exits with status 255 (same for any other error). Which is, um,
> not very friendly to automated error handling...

So what are the options? One should return different error codes as
returned by glibc? But bash or other script will not have a library
to translate it. I guess scripts will have to hard code the meaning
of a particular return code.

BTW, user can always try kexec_file_load() and if it fails, try older
version of syscall and that should work for lot of use cases?

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Vivek Goyal

On Fri, Jul 15, 2016 at 09:31:02AM +0200, Arnd Bergmann wrote:
> On Thursday, July 14, 2016 10:44:14 PM CEST Thiago Jung Bauermann wrote:
> > Am Donnerstag, 14 Juli 2016, 10:29:11 schrieb Arnd Bergmann:
> 
> > > 
> > > Right, but the question remains whether this helps while you allow the
> > > boot loader to modify the dtb. If an attacker gets in and cannot modify
> > > the kernel or initid but can modify the DT, a successful attack would
> > > be a bit harder than having a modified kernel, but you may still need
> > > to treat the system as compromised.
> > 
> > Yes, and the same question also remains regarding the kernel command line.
> > 
> > We can have the kernel perform sanity checks on the device tree, just as 
> > the 
> > kernel needs to sanity check the command line.
> > 
> > There's the point that was raised about not wanting to increase the attack 
> > surface, and that's a valid point. But at least in the way Petitboot works 
> > today, it needs to modify the device tree and pass it to the kernel.
> > 
> > One thing that is unavoidable to come from userspace is 
> > /chosen/linux,stdout-path, because it's Petitboot that knows from which 
> > console the user is interacting with. The other modification to set 
> > properties in vga@0 can be done in the kernel.
> > 
> > Given that on DTB-based systems /chosen is an important and established way 
> > to pass information to the operating system being booted, I'd like to 
> > suggest the following, then:
> > 
> > Extend the syscall as shown in this RFC from Takahiro AKASHI, but instead 
> > of 
> > accepting a complete DTB from userspace, the syscall would accept a DTB 
> > containing only a /chosen node. If the DTB contains any other node, the 
> > syscall fails with EINVAL. The kernel can then add the properties in 
> > /chosen 
> > to the device tree that it will pass to the next kernel.
> > 
> > What do you think?
> 
> I think that helps, as it makes the problem space correspond to that
> of modifying the command line, but I can still come up with countless
> attacks based on modifications of the /chosen node and/or the command
> line, in fact it's probably easier than any other node.

I don't know anything about DTB. So here comes a very basic question. Does
DTB allow passing an executable blob to kernel or pass the location of
some unsigned executable code at kernel level. I think from secureboot point of
view that would be a concern. Being able to trick kernel to execute an
unsigned code at privileged level.

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 3/3] kexec: extend kexec_file_load system call

2016-07-15 Thread Vivek Goyal

On Tue, Jul 12, 2016 at 10:42:01AM +0900, AKASHI Takahiro wrote:

[..]
> -SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
> +SYSCALL_DEFINE6(kexec_file_load, int, kernel_fd, int, initrd_fd,
>   unsigned long, cmdline_len, const char __user *, cmdline_ptr,
> - unsigned long, flags)
> + unsigned long, flags, const struct kexec_fdset __user *, ufdset)

Can one add more parameters to existing syscall. Can it break existing
programs with new kernel? I was of the impression that one can't do that.
But may be I am missing something.

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Vivek Goyal

On Fri, Jul 15, 2016 at 09:49:25AM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 13, 2016 at 03:13:42PM +0200, Arnd Bergmann wrote:
> > On Wednesday, July 13, 2016 10:41:28 AM CEST Mark Rutland wrote:
> > > The big question is whether this is a realistic case on a secure boot
> > > system.
> > 
> > What does x86 do here? I assume changes to the command line are also
> > limited.
> 
> They aren't.  You can specify /anything/ even with a fully-signed kernel
> and initrd, which was one of the things I pointed out in my previous
> set of responses.

Yes, kernel command line is not signed. For that matter even initird is
not signed. Just kernel is signed and its signatures are verified. Idea
is an unsigned code should not be able to execute in kernel space.

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec: Add option to fallback to old kexec syscall when kexec file based syscall failed

2016-07-15 Thread Vivek Goyal

On Fri, Jul 15, 2016 at 09:58:22AM +0200, Petr Tesarik wrote:
> On Fri, 15 Jul 2016 07:57:22 +0800
> joeyli <j...@suse.com> wrote:
> 
> > Hi Vivek
> > 
> > On Thu, Jul 14, 2016 at 10:53:28AM -0400, Vivek Goyal wrote:
> > > On Thu, Jul 14, 2016 at 04:45:11PM +0800, Lee, Chun-Yi wrote:
> > > > This patch adds a new "--fallback-kexec" option to give a chance to
> > > > fallback to old kexec syscall when file based kexec syscall operation
> > > > failed.
> > > 
> > > I think caller should switch to using different interface if need be. But
> > > I don't see much point in providing an option for this in kexec-tools.
> > > 
> > > Vivek
> > >
> > 
> > OK~ Understood!
> > 
> > Thanks for Baoquan's and your opinion for this patch.
> 
> Is there some sort of diagnostics, so a calling script can determine
> whether kexec failed, because there's no suppor for kexec_file_load(2)
> or for a different reason? 

Will we not get -ENOSYS if kexec_file_load() is not implemented?

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec: Add option to fallback to old kexec syscall when kexec file based syscall failed

2016-07-14 Thread Vivek Goyal

On Thu, Jul 14, 2016 at 04:45:11PM +0800, Lee, Chun-Yi wrote:
> This patch adds a new "--fallback-kexec" option to give a chance to
> fallback to old kexec syscall when file based kexec syscall operation
> failed.

I think caller should switch to using different interface if need be. But
I don't see much point in providing an option for this in kexec-tools.

Vivek

> 
> This option works with --kexec-file-syscall to provide more flexible
> way to adapt to different kernels that those kernels built with
> different kexec syscall config or have different verification policy.
> 
> Cc: Simon Horman <ho...@verge.net.au>
> Cc: Petr Tesarik <ptesa...@suse.com>
> Cc: Vivek Goyal <vgo...@redhat.com>
> Signed-off-by: Lee, Chun-Yi <j...@suse.com>
> ---
>  kexec/kexec.c | 13 +
>  kexec/kexec.h |  4 +++-
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/kexec/kexec.c b/kexec/kexec.c
> index 500e5a9..e05b43f 100644
> --- a/kexec/kexec.c
> +++ b/kexec/kexec.c
> @@ -969,6 +969,7 @@ void usage(void)
>  "  preserve context)\n"
>  "  to original kernel.\n"
>  " -s, --kexec-file-syscall Use file based syscall for kexec 
> operation\n"
> +" --fallback-kexec Fallback to old kexec when file based 
> syscall failed\n"
>  " -d, --debug   Enable debugging to help spot a 
> failure.\n"
>  "\n"
>  "Supported kernel file types and options: \n");
> @@ -1204,6 +1205,7 @@ int main(int argc, char *argv[])
>   int do_unload = 0;
>   int do_reuse_initrd = 0;
>   int do_kexec_file_syscall = 0;
> + int do_fallback_kexec_syscall = 0;
>   void *entry = 0;
>   char *type = 0;
>   char *endptr;
> @@ -1226,9 +1228,13 @@ int main(int argc, char *argv[])
>   case OPT_KEXEC_FILE_SYSCALL:
>   do_kexec_file_syscall = 1;
>   break;
> + case OPT_FALLBACK_KEXEC:
> + do_fallback_kexec_syscall = 1;
> + break;
>   }
>   }
>  
> +fallback:
>   /* Reset getopt for the next pass. */
>   opterr = 1;
>   optind = 1;
> @@ -1407,6 +1413,13 @@ int main(int argc, char *argv[])
>   result = my_load(type, fileind, argc, argv,
>   kexec_flags, entry);
>   }
> + /* fallback to old kexec syscall */
> + if (do_kexec_file_syscall && result != 0 && do_fallback_kexec_syscall) {
> + fprintf(stderr, "Fallback to kexec syscall\n");
> + do_kexec_file_syscall = 0;
> + do_fallback_kexec_syscall = 0;
> + goto fallback;
> + }
>   /* Don't shutdown unless there is something to reboot to! */
>   if ((result == 0) && (do_shutdown || do_exec) && !kexec_loaded()) {
>   die("Nothing has been loaded!\n");
> diff --git a/kexec/kexec.h b/kexec/kexec.h
> index 9194f1c..65dbd56 100644
> --- a/kexec/kexec.h
> +++ b/kexec/kexec.h
> @@ -225,7 +225,8 @@ extern int file_types;
>  #define OPT_LOAD_PRESERVE_CONTEXT 259
>  #define OPT_LOAD_JUMP_BACK_HELPER 260
>  #define OPT_ENTRY261
> -#define OPT_MAX  262
> +#define OPT_FALLBACK_KEXEC   262
> +#define OPT_MAX  263
>  #define KEXEC_OPTIONS \
>   { "help",   0, 0, OPT_HELP }, \
>   { "version",0, 0, OPT_VERSION }, \
> @@ -244,6 +245,7 @@ extern int file_types;
>   { "mem-max",1, 0, OPT_MEM_MAX }, \
>   { "reuseinitrd",0, 0, OPT_REUSE_INITRD }, \
>   { "kexec-file-syscall", 0, 0, OPT_KEXEC_FILE_SYSCALL }, \
> + { "fallback-kexec", 0, 0, OPT_FALLBACK_KEXEC }, \
>   { "debug",  0, 0, OPT_DEBUG }, \
>  
>  #define KEXEC_OPT_STR "h?vdfxyluet:ps"
> -- 
> 2.6.6

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-13 Thread Vivek Goyal

On Wed, Jul 13, 2016 at 06:40:10PM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 13, 2016 at 09:03:38AM -0400, Vivek Goyal wrote:
> > On Wed, Jul 13, 2016 at 09:26:39AM +0100, Russell King - ARM Linux wrote:
> > > Indeed - maybe Eric knows better, but I can't see any situation where
> > > the dtb we load via kexec should ever affect "the bootloader", unless
> > > the "kernel" that's being loaded into kexec is "the bootloader".
> > > 
> > > Now, going back to the more fundamental issue raised in my first reply,
> > > about the kernel command line.
> > > 
> > > On x86, I can see that it _is_ possible for userspace to specify a
> > > command line, and the kernel loading the image provides the command
> > > line to the to-be-kexeced kernel with very little checking.  So, if
> > > your kernel is signed, what stops the "insecure userspace" loading
> > > a signed kernel but giving it an insecure rootfs and/or console?
> > 
> > It is not kexec specific. I could do this for regular boot too, right?
> > 
> > Command line options are not signed. I thought idea behind secureboot
> > was to execute only trusted code and command line options don't enforce
> > you to execute unsigned code.
> > 
> > So it sounds like different class of security problems which you are
> > referring to and not necessarily covered by secureboot or signed
> > kernel.
> 
> Let me give you an example.
> 
> You have a secure boot setup, where the firmware/ROM validates the boot
> loader.  Good, the boot loader hasn't been tampered with.
> 
> You interrupt the boot loader and are able to modify the command line
> for the booted kernel.
> 
> The boot loader loads the kernel and verifies the kernel's signature.
> Good, the kernel hasn't been tampered with.  The kernel starts running.
> 
> You've plugged in a USB drive to the device, and specified a partition
> containing a root filesystem that you control to the kernel.  The
> validated kernel finds the USB drive, and mounts it, and executes
> your own binaries on the USB drive.

You will require physical access to the machine to be able to
insert your usb drive. And IIRC, argument was that if attacker has
physical access to machine, all bets are off anyway.

> 
> You run a shell on the console.  You now have control of the system,
> and can mount the real rootfs, inspect it, and work out what it does,
> etc.
> 
> At this point, what use was all the validation that the secure boot
> has done?  Absolutely useless.
> 
> If you can change the command line arguments given to the kernel, you
> have no security, no matter how much you verify signatures.  It's
> the illusion of security, nothing more, nothing less.
> 
> -- 
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-13 Thread Vivek Goyal

On Wed, Jul 13, 2016 at 09:45:22AM +1000, Stewart Smith wrote:
> Vivek Goyal <vgo...@redhat.com> writes:
> > On Tue, Jul 12, 2016 at 10:58:09AM -0300, Thiago Jung Bauermann wrote:
> >> Hello Eric,
> >> 
> >> Am Dienstag, 12 Juli 2016, 08:25:48 schrieb Eric W. Biederman:
> >> > AKASHI Takahiro <takahiro.aka...@linaro.org> writes:
> >> > > Device tree blob must be passed to a second kernel on DTB-capable
> >> > > archs, like powerpc and arm64, but the current kernel interface
> >> > > lacks this support.
> >> > > 
> >> > > This patch extends kexec_file_load system call by adding an extra
> >> > > argument to this syscall so that an arbitrary number of file 
> >> > > descriptors
> >> > > can be handed out from user space to the kernel.
> >> > > 
> >> > > See the background [1].
> >> > > 
> >> > > Please note that the new interface looks quite similar to the current
> >> > > system call, but that it won't always mean that it provides the "binary
> >> > > compatibility."
> >> > > 
> >> > > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html
> >> > 
> >> > So this design is wrong.  The kernel already has the device tree blob,
> >> > you should not be extracting it from the kernel munging it, and then
> >> > reinserting it in the kernel if you want signatures and everything to
> >> > pass.
> >> > 
> >> > What x86 does is pass it's equivalent of the device tree blob from one
> >> > kernel to another directly and behind the scenes.  It does not go
> >> > through userspace for this.
> >> > 
> >> > Until a persuasive case can be made for going around the kernel and
> >> > probably adding a feature (like code execution) that can be used to
> >> > defeat the signature scheme I am going to nack this.
> >> 
> >> There are situations where userspace needs to change things in the device 
> >> tree to be used by the next kernel.
> >> 
> >> For example, Petitboot (the boot loader used in OpenPOWER machines) is a 
> >> userspace application running in an intermediary Linux instance and uses 
> >> kexec to load the target OS. It has to modify the device tree that will be 
> >> used by the next kernel so that the next kernel uses the same console that 
> >> petitboot was configured to use (i.e., set the /chosen/linux,stdout-path 
> >> property). It also modifies the device tree to allow the kernel to inherit 
> >> Petitboot's Openfirmware framebuffer.
> >
> > Can some of this be done with the help of kernel command line options for
> > second kernel?
> 
> how would this be any more secure?
> 
> Passing in an address for a framebuffer via command line option means
> you could scribble over any bit of memory, which is the same kind of
> damage you could do by modifying the device tree.

It is not necessarily safer but works with given framework and we don't
have to modify existing system call.

Also it will allow you to pass in only one thing at a time instead of
allowing passing in new unsigned DTB, which can potentially do lot more.

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-13 Thread Vivek Goyal

On Wed, Jul 13, 2016 at 09:41:39AM +1000, Stewart Smith wrote:
> Petr Tesarik  writes:
> > On Tue, 12 Jul 2016 13:25:11 -0300
> > Thiago Jung Bauermann  wrote:
> >
> >> Hi Eric,
> >> 
> >> I'm trying to understand your concerns leading to your nack. I hope you 
> >> don't mind expanding your thoughts on them a bit.
> >> 
> >> Am Dienstag, 12 Juli 2016, 08:25:48 schrieb Eric W. Biederman:
> >> > AKASHI Takahiro  writes:
> >> > > Device tree blob must be passed to a second kernel on DTB-capable
> >> > > archs, like powerpc and arm64, but the current kernel interface
> >> > > lacks this support.
> >> > > 
> >> > > This patch extends kexec_file_load system call by adding an extra
> >> > > argument to this syscall so that an arbitrary number of file 
> >> > > descriptors
> >> > > can be handed out from user space to the kernel.
> >> > > 
> >> > > See the background [1].
> >> > > 
> >> > > Please note that the new interface looks quite similar to the current
> >> > > system call, but that it won't always mean that it provides the "binary
> >> > > compatibility."
> >> > > 
> >> > > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html
> >> > 
> >> > So this design is wrong.  The kernel already has the device tree blob,
> >> > you should not be extracting it from the kernel munging it, and then
> >> > reinserting it in the kernel if you want signatures and everything to
> >> > pass.
> >> 
> >> I don't understand how the kernel signature will be invalidated. 
> >> 
> >> There are some types of boot images that can embed a device tree blob in 
> >> them, but the kernel can also be handed a separate device tree blob from 
> >> firmware, the boot loader, or kexec. This latter case is what we are 
> >> discussing, so we are not talking about modifying an embedded blob in the 
> >> kernel image.
> >> 
> >> > What x86 does is pass it's equivalent of the device tree blob from one
> >> > kernel to another directly and behind the scenes.  It does not go
> >> > through userspace for this.
> >> > 
> >> > Until a persuasive case can be made for going around the kernel and
> >> > probably adding a feature (like code execution) that can be used to
> >> > defeat the signature scheme I am going to nack this.
> >> 
> >> I also don't understand what you mean by code execution. How does passing 
> >> a 
> >> device tree blob via kexec enables code execution? How can the signature 
> >> scheme be defeated?
> >
> > I'm not an expert on DTB, so I can't provide an example of code
> > execution, but you have already mentioned the /chosen/linux,stdout-path
> > property. If an attacker redirects the bootloader to an insecure
> > console, they may get access to the system that would otherwise be
> > impossible.
> 
> In this case, the user is sitting at the (or one of the) console(s) of
> the machine. There could be petitboot UIs running on the VGA display,
> IPMI serial over lan, local serial port. The logic behind setting
> /chosen/linux,stdout-path is (currently) mostly to set it for the kernel
> to what the user is interacting with. i.e. if you select an OS installer
> to boot from the VGA console, you get a graphical installer running and
> if you selected it from a text console, you get a text installer running
> (on the appropriate console).
> 
> So the bootloader (petitboot) needs to work out which console is being
> interacted with in order to set up /chosen/linux,stdout-path correctly.
> 
> This specific option could be passed as a kernel command line to the
> next kernel, yes. However, isn't the kernel command line also an attack
> vector? Is *every* command line option safe?

I don't think kernel command line is signed. And we will have to define
what is considered *unsafe*. I am working on the assumption that a
user should not be able to force execution of unsigned code at provileged
level. And passing console on kernel command line should be safe in
that respect?

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-13 Thread Vivek Goyal

On Wed, Jul 13, 2016 at 09:26:39AM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 13, 2016 at 05:55:33PM +1000, Stewart Smith wrote:
> > Russell King - ARM Linux  writes:
> > > On Wed, Jul 13, 2016 at 02:59:51PM +1000, Stewart Smith wrote:
> > >> Russell King - ARM Linux  writes:
> > >> > On Tue, Jul 12, 2016 at 10:58:05PM +0200, Petr Tesarik wrote:
> > >> >> I'm not an expert on DTB, so I can't provide an example of code
> > >> >> execution, but you have already mentioned the 
> > >> >> /chosen/linux,stdout-path
> > >> >> property. If an attacker redirects the bootloader to an insecure
> > >> >> console, they may get access to the system that would otherwise be
> > >> >> impossible.
> > >> >
> > >> > I fail to see how kexec connects with the boot loader - the DTB image
> > >> > that's being talked about is one which is passed from the currently
> > >> > running kernel to the to-be-kexec'd kernel.  For ARM (and I suspect
> > >> > also ARM64) that's a direct call chain which doesn't involve any
> > >> > boot loader or firmware, and certainly none that would involve the
> > >> > passed DTB image.
> > >> 
> > >> For OpenPOWER machines, kexec is the bootloader. Our bootloader is a
> > >> linux kernel and initramfs with a UI (petitboot) - this means we never
> > >> have to write a device driver twice: write a kernel one and you're done
> > >> (for booting from the device and using it in your OS).
> > >
> > > I think you misunderstood my point.
> > >
> > > On ARM, we do not go:
> > >
> > >   kernel (kexec'd from) -> boot loader -> kernel (kexec'd to)
> > >
> > > but we go:
> > >
> > >   kernel (kexec'd from) -> kernel (kexec'd to)
> > >
> > > There's no intermediate step involving any bootloader.
> > >
> > > Hence, my point is that the dtb loaded by kexec is _only_ used by the
> > > kernel which is being kexec'd to, not by the bootloader, nor indeed
> > > the kernel which it is loaded into.
> > >
> > > Moreover, if you read the bit that I quoted (which is what I was
> > > replying to), you'll notice that it is talking about the DTB loaded
> > > by kexec somehow causing the _bootloader_ to be redirected to an
> > > alternative console.  This point is wholely false on ARM.
> > 
> > Ahh.. I missed the bootloader bit there.
> > 
> > In which case, we're the same on OpenPOWER, there is no intermediate
> > bootloader - in our case we have linux (with kexec) taking on what uboot
> > or grub is typically used for on other platforms.
> 
> Indeed - maybe Eric knows better, but I can't see any situation where
> the dtb we load via kexec should ever affect "the bootloader", unless
> the "kernel" that's being loaded into kexec is "the bootloader".
> 
> Now, going back to the more fundamental issue raised in my first reply,
> about the kernel command line.
> 
> On x86, I can see that it _is_ possible for userspace to specify a
> command line, and the kernel loading the image provides the command
> line to the to-be-kexeced kernel with very little checking.  So, if
> your kernel is signed, what stops the "insecure userspace" loading
> a signed kernel but giving it an insecure rootfs and/or console?

It is not kexec specific. I could do this for regular boot too, right?

Command line options are not signed. I thought idea behind secureboot
was to execute only trusted code and command line options don't enforce
you to execute unsigned code.

So it sounds like different class of security problems which you are
referring to and not necessarily covered by secureboot or signed
kernel.

Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-12 Thread Vivek Goyal

On Tue, Jul 12, 2016 at 04:02:46PM +0200, Arnd Bergmann wrote:
> On Tuesday, July 12, 2016 8:25:48 AM CEST Eric W. Biederman wrote:
> > AKASHI Takahiro  writes:
> > 
> > > Device tree blob must be passed to a second kernel on DTB-capable
> > > archs, like powerpc and arm64, but the current kernel interface
> > > lacks this support.
> > >   
> > > This patch extends kexec_file_load system call by adding an extra
> > > argument to this syscall so that an arbitrary number of file descriptors
> > > can be handed out from user space to the kernel.
> > >
> > > See the background [1].
> > >
> > > Please note that the new interface looks quite similar to the current
> > > system call, but that it won't always mean that it provides the "binary
> > > compatibility."
> > >
> > > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html
> > 
> > So this design is wrong.  The kernel already has the device tree blob,
> > you should not be extracting it from the kernel munging it, and then
> > reinserting it in the kernel if you want signatures and everything to
> > pass.
> > 
> > What x86 does is pass it's equivalent of the device tree blob from one
> > kernel to another directly and behind the scenes.  It does not go
> > through userspace for this.
> > 
> > Until a persuasive case can be made for going around the kernel and
> > probably adding a feature (like code execution) that can be used to
> > defeat the signature scheme I am going to nack this.
> > 
> > Nacked-by: "Eric W. Biederman" 
> > 
> > I am happy to see support for other architectures, but for the sake of
> > not moving some code in the kernel let's not build an attackable
> > infrastructure.
> > 
> 
> For historic context, the flattened devicetree format that we now use
> to pass data about the system from boot loader to kernel was initially
> introduced specifically for the purpose of enabling kexec:
> 
> On Open Firmware, the DT is extracted from running firmware and copied
> into dynamically allocated data structures. After a kexec, the runtime
> interface to the firmware is not available, so the flattened DT format
> was created as a way to pass the same data in a binary blob to the new
> kernel in a format that can be read from the kernel by walking the
> directories in /proc/device-tree/*.

So this DT is available inside kernel and running kernel can still
retrieve it and pass it to second kernel?

> 
> There are a couple of reasons for modifying the devicetree:
> 
> - For kboot/petitboot, you can have a kernel that is not booted through
>   DT at all but hardwired to a particular machine, and that passes
>   a DT for the entire hardware to the kernel that you actually want to
>   run.
> 
> - for kdump, you need to tell the new kernel about the modified location
>   of the memory, so the dump kernel doesn't overwrite the contents
>   it wants to dump

In x86 we do this with the help of kernel command line options.

> 
> - we typically ship devicetree sources for embedded machines with the
>   kernel sources. As more hardware of the system gets enabled, the
>   devicetree gains extra nodes and properties that describe the hardware
>   more completely, so we need to use the latest DT blob to use all
>   the drivers
> 
> - in some cases, kernels will fail to boot at all with an older version
>   of the DT, or fail to use the devices that were working on the
>   earlier kernel. This is usually considered a bug, but it's not rare
> 
> - In some cases, the kernel can update its DT at runtime, and the new
>   settings are expected to be available in the new kernel too, though
>   there are cases where you actually don't want the modified contents.

I am assuming that modified DT and unmodifed one both are accessible to
kernel. And if user space can make decisions which modfied fields to use
for new kernels and which ones not, then same can be done in kernel too?

Vivek
> 
>   Arnd

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-12 Thread Vivek Goyal

On Tue, Jul 12, 2016 at 10:58:09AM -0300, Thiago Jung Bauermann wrote:
> Hello Eric,
> 
> Am Dienstag, 12 Juli 2016, 08:25:48 schrieb Eric W. Biederman:
> > AKASHI Takahiro  writes:
> > > Device tree blob must be passed to a second kernel on DTB-capable
> > > archs, like powerpc and arm64, but the current kernel interface
> > > lacks this support.
> > > 
> > > This patch extends kexec_file_load system call by adding an extra
> > > argument to this syscall so that an arbitrary number of file descriptors
> > > can be handed out from user space to the kernel.
> > > 
> > > See the background [1].
> > > 
> > > Please note that the new interface looks quite similar to the current
> > > system call, but that it won't always mean that it provides the "binary
> > > compatibility."
> > > 
> > > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html
> > 
> > So this design is wrong.  The kernel already has the device tree blob,
> > you should not be extracting it from the kernel munging it, and then
> > reinserting it in the kernel if you want signatures and everything to
> > pass.
> > 
> > What x86 does is pass it's equivalent of the device tree blob from one
> > kernel to another directly and behind the scenes.  It does not go
> > through userspace for this.
> > 
> > Until a persuasive case can be made for going around the kernel and
> > probably adding a feature (like code execution) that can be used to
> > defeat the signature scheme I am going to nack this.
> 
> There are situations where userspace needs to change things in the device 
> tree to be used by the next kernel.
> 
> For example, Petitboot (the boot loader used in OpenPOWER machines) is a 
> userspace application running in an intermediary Linux instance and uses 
> kexec to load the target OS. It has to modify the device tree that will be 
> used by the next kernel so that the next kernel uses the same console that 
> petitboot was configured to use (i.e., set the /chosen/linux,stdout-path 
> property). It also modifies the device tree to allow the kernel to inherit 
> Petitboot's Openfirmware framebuffer.

Can some of this be done with the help of kernel command line options for
second kernel?

Vivek


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] MAINTAINERS: Kdump maintainers update

2016-05-25 Thread Vivek Goyal

On Wed, May 25, 2016 at 06:24:10AM -0700, Joe Perches wrote:
> On Wed, 2016-05-25 at 09:16 -0400, Vivek Goyal wrote:
> > I am proposing following updates to kdump maintainership. I have got
> > busy in other things and not getting time to spend on kdump. 
> > 
> > Removed Haren Myneni as he has not participated in kdump development for
> > a long time now.
> > 
> > Proposing adding the names of Dave and Baoquan as kdump maintainers as
> > they have been contributing to kdump for a long time now and they are in
> > a much better position to spend time on this than me.
> []
> > diff --git a/MAINTAINERS b/MAINTAINERS
> []
> > @@ -6189,8 +6189,9 @@ F:Documentation/kbuild/kconfig-language.txt
> >  F: scripts/kconfig/
> >  
> >  KDUMP
> > +M: Dave Young <dyo...@redhat.com>
> > +M: Baoquan He <b...@redhat.com>
> >  M: Vivek Goyal <vgo...@redhat.com>
> 
> You could mark yourself as an "R:" reviewer
> instead of an "M:" maintainer.

That's a good idea. I updated the patch and marked myself reviewer.

Removed Haren Myneni as he has not participated in kdump development for
a long time now.

Proposing adding the names of Dave and Baoquan as kdump maintainers as
they have been working on it for quite some time on it upstream and
they are in a much better position to spend time on this than me.

Signed-off-by: Vivek Goyal <vgo...@redhat.com>
---
 MAINTAINERS | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9c567a4..5792ec2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6189,8 +6189,9 @@ F:Documentation/kbuild/kconfig-language.txt
 F: scripts/kconfig/
 
 KDUMP
-M:     Vivek Goyal <vgo...@redhat.com>
-M: Haren Myneni <hb...@us.ibm.com>
+M: Dave Young <dyo...@redhat.com>
+M: Baoquan He <b...@redhat.com>
+R: Vivek Goyal <vgo...@redhat.com>
 L: kexec@lists.infradead.org
 W: http://lse.sourceforge.net/kdump/
 S: Maintained
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH] MAINTAINERS: Kdump maintainers update

2016-05-25 Thread Vivek Goyal

Hi,

I am proposing following updates to kdump maintainership. I have got
busy in other things and not getting time to spend on kdump. 

Removed Haren Myneni as he has not participated in kdump development for
a long time now.

Proposing adding the names of Dave and Baoquan as kdump maintainers as
they have been contributing to kdump for a long time now and they are in
a much better position to spend time on this than me.

Signed-off-by: Vivek Goyal <vgo...@redhat.com>
---
 MAINTAINERS | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9c567a4..c030267 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6189,8 +6189,9 @@ F:Documentation/kbuild/kconfig-language.txt
 F: scripts/kconfig/
 
 KDUMP
+M: Dave Young <dyo...@redhat.com>
+M: Baoquan He <b...@redhat.com>
 M: Vivek Goyal <vgo...@redhat.com>
-M: Haren Myneni <hb...@us.ibm.com>
 L: kexec@lists.infradead.org
 W: http://lse.sourceforge.net/kdump/
 S: Maintained
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec: Add --lite option

2015-10-22 Thread Vivek Goyal

On Thu, Oct 22, 2015 at 11:57:20AM -0700, Geoff Levand wrote:
> Hi Dave,
> 
> On Thu, 2015-10-22 at 11:17 +0800, Dave Young wrote:
> > On 10/21/15 at 04:12pm, Geoff Levand wrote:
> > > Add a new option --lite to kexec that allows for a fast reboot
> > > by avoiding the purgatory integrity checks.  This option is
> > > intended for use by kexec based bootloaders that load a new
> > > image and then immediately transfer control to it.
> > 
> > I think Vivek was rejecting this --lite since kdump need the purgatory
> > integrity checks. Ccing him.
> 
> As stated, this is not intended for use by kdump.
> 
> This is an optional feature.  It does not remove the integrity
> checks, but provides the user a way to bypass them if they so
> desire.

Why would somebody like to bypass these checks?

> 
> > > It was reported that on some systems where purgatory is running
> > > without caches enabled the sha256 calculations would take several
> > > minutes.  For bootloaders that just load a new image and
> > > immediately jump into it the loss of the integrity check is worth
> > > the increase in boot speed.  Please consider.  
> > 
> > Pratyush reported the arm64 issue, he sent a patch to fix it with
> > enabling cache for purgatory. I think the patch can fix the problem.
> > Why not fix it? The fix is simple enough and it does not introduce
> > complicate logic.
> 
> This patch also is simple, and is architecture independent.  I see this
> feature as an improvement to kexec, not necessarily as a fix for that
> problem.

I am not sure why does somebody care if segments are being checked
during transition or not?

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec: Add --lite option

2015-10-22 Thread Vivek Goyal

On Thu, Oct 22, 2015 at 11:17:18AM +0800, Dave Young wrote:
> On 10/21/15 at 04:12pm, Geoff Levand wrote:
> > Add a new option --lite to kexec that allows for a fast reboot
> > by avoiding the purgatory integrity checks.  This option is
> > intended for use by kexec based bootloaders that load a new
> > image and then immediately transfer control to it.
> 
> I think Vivek was rejecting this --lite since kdump need the purgatory
> integrity checks. Ccing him.

Right. Why are we trying to bypass sha256 hash verification of loaded
segments at run time, that needs to be justified. 

Atleast on x86, this integrity verification was fast and we did not
notice any siginificant delays in purgatory. And in that case extra knob
like this is hard to justify.

Thanks
Vivek

> 
> > 
> > Signed-off-by: Geoff Levand 
> > ---
> > Hi Simon,
> > 
> > It was reported that on some systems where purgatory is running
> > without caches enabled the sha256 calculations would take several
> > minutes.  For bootloaders that just load a new image and
> > immediately jump into it the loss of the integrity check is worth
> > the increase in boot speed.  Please consider.  
> 
> Pratyush reported the arm64 issue, he sent a patch to fix it with
> enabling cache for purgatory. I think the patch can fix the problem.
> Why not fix it? The fix is simple enough and it does not introduce
> complicate logic.
> 
> > 
> > -Geoff
> > 
> >  kexec/kexec.8 |  3 +++
> >  kexec/kexec.c | 19 +--
> >  kexec/kexec.h |  4 
> >  purgatory/purgatory.c |  3 ++-
> >  4 files changed, 26 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kexec/kexec.8 b/kexec/kexec.8
> > index 4d0c1d1..93ed588 100644
> > --- a/kexec/kexec.8
> > +++ b/kexec/kexec.8
> > @@ -126,6 +126,9 @@ in one call.
> >  Open a help file for
> >  .BR kexec .
> >  .TP
> > +.B \-i\ (\-\-lite)
> > +Fast reboot, no memory integrity checks.
> > +.TP
> >  .BI \-l\ (\-\-load) \ kernel
> >  Load the specified
> >  .I kernel
> > diff --git a/kexec/kexec.c b/kexec/kexec.c
> > index ff024f3..ebb1310 100644
> > --- a/kexec/kexec.c
> > +++ b/kexec/kexec.c
> > @@ -613,6 +613,15 @@ static void update_purgatory(struct kexec_info *info)
> > return;
> > }
> > arch_update_purgatory(info);
> > +
> > +   if (info->kexec_lite) {
> > +   unsigned int tmp = 1;
> > +
> > +   elf_rel_set_symbol(>rhdr, "kexec_lite", ,
> > +   sizeof(tmp));
> > +   return;
> > +   }
> > +
> > memset(region, 0, sizeof(region));
> > sha256_starts();
> > /* Compute a hash of the loaded kernel */
> > @@ -652,7 +661,7 @@ static void update_purgatory(struct kexec_info *info)
> >   * Load the new kernel
> >   */
> >  static int my_load(const char *type, int fileind, int argc, char **argv,
> > -  unsigned long kexec_flags, void *entry)
> > +  unsigned long kexec_flags, int kexec_lite, void *entry)
> >  {
> > char *kernel;
> > char *kernel_buf;
> > @@ -665,6 +674,7 @@ static int my_load(const char *type, int fileind, int 
> > argc, char **argv,
> >  
> > memset(, 0, sizeof(info));
> > info.kexec_flags = kexec_flags;
> > +   info.kexec_lite = kexec_lite;
> >  
> > result = 0;
> > if (argc - fileind <= 0) {
> > @@ -914,6 +924,7 @@ void usage(void)
> >" -v, --versionPrint the version of kexec.\n"
> >" -f, --force  Force an immediate kexec,\n"
> >"  don't call shutdown.\n"
> > +  " -i, --lite   Fast reboot, no memory integrity 
> > checks.\n"
> >" -x, --no-ifdown  Don't bring down network interfaces.\n"
> >" -y, --no-syncDon't sync filesystems before kexec.\n"
> >" -l, --load   Load the new kernel into the\n"
> > @@ -1173,6 +1184,7 @@ int main(int argc, char *argv[])
> > int do_unload = 0;
> > int do_reuse_initrd = 0;
> > int do_kexec_file_syscall = 0;
> > +   int do_lite = 0;
> > void *entry = 0;
> > char *type = 0;
> > char *endptr;
> > @@ -1314,6 +1326,9 @@ int main(int argc, char *argv[])
> > case OPT_KEXEC_FILE_SYSCALL:
> > /* We already parsed it. Nothing to do. */
> > break;
> > +   case OPT_LITE:
> > +   do_lite = 1;
> > +   break;
> > default:
> > break;
> > }
> > @@ -1374,7 +1389,7 @@ int main(int argc, char *argv[])
> >  kexec_file_flags);
> > else
> > result = my_load(type, fileind, argc, argv,
> > -   kexec_flags, entry);
> > +   kexec_flags, do_lite, entry);
> > }
> > /* Don't shutdown unless there is something to reboot to! */
> > if ((result == 0) && (do_shutdown || do_exec) &&

Re: [PATCH] kexec: Remove the unnecessary conditional judgement to simplify the code logic

2015-07-27 Thread Vivek Goyal

On Sat, Jun 06, 2015 at 02:14:12PM +0800, Minfei Huang wrote:
 From: Minfei Huang mnfhu...@gmail.com
 
 Transforming PFN(Page Frame Number) to struct page is never failure, so
 we can simplify the code logic to do the image-control_page assignment
 directly in the loop, and remove the unnecessary conditional judgement.
 
 Signed-off-by: Minfei Huang mnfhu...@gmail.com

Looks good to me.

Acked-by: Vivek Goyal vgo...@redhat.com

Thanks
Vivek

 ---
  kernel/kexec.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)
 
 diff --git a/kernel/kexec.c b/kernel/kexec.c
 index 7a36fdc..4589899 100644
 --- a/kernel/kexec.c
 +++ b/kernel/kexec.c
 @@ -796,11 +796,10 @@ static struct page 
 *kimage_alloc_crash_control_pages(struct kimage *image,
   /* If I don't overlap any segments I have found my hole! */
   if (i == image-nr_segments) {
   pages = pfn_to_page(hole_start  PAGE_SHIFT);
 + image-control_page = hole_end;
   break;
   }
   }
 - if (pages)
 - image-control_page = hole_end;
  
   return pages;
  }
 -- 
 2.2.2

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH V2 2/2] kexec: split kexec_load syscall from kexec core code

2015-07-21 Thread Vivek Goyal

On Mon, Jul 20, 2015 at 04:37:15PM +0800, dyo...@redhat.com wrote:
 Now there's two kexec load syscall, one is kexec_load another is
 kexec_file_load, kexec_file_load has been splited as kernel/kexec_file.c.
 In this patch I split kexec_load syscall code to kernel/kexec.c.

Hi Dave,

Nice work. Thanks for doing this. I have couple of minor comments.

- We might have to audit kernel/kexec_core.c. I think there are some
  functions in there which are used by only old syscall and not the new
  one. All that code should be in kernel/kexec.c. Only the code which is
  shared between two syscalls should be in kernel/kexec_core.c.
  
  For example, I think kimage_alloc_init() is used by old syscall only.
  New syscall uses kimage_file_alloc_init().

[..]
 --- linux.orig/include/linux/kexec.h
 +++ linux/include/linux/kexec.h
 @@ -16,7 +16,7 @@
  
  #include uapi/linux/kexec.h
  
 -#ifdef CONFIG_KEXEC
 +#ifdef CONFIG_KEXEC_CORE
  #include linux/list.h
  #include linux/linkage.h
  #include linux/compat.h
 @@ -318,12 +318,18 @@ int crash_shrink_memory(unsigned long ne
  size_t crash_get_memory_size(void);
  void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
  
 -#else /* !CONFIG_KEXEC */
 +#ifdef CONFIG_KEXEC
 +int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
 +  unsigned long nr_segments,
 +  struct kexec_segment __user *segments,
 +  unsigned long flags);
 +#endif

I am wondering why this needs to be in kexec.h. Who needs this? Even if
somebody needs this, this should probably be outside of KEXEC_CORE.

#ifdef CONFIG_KEXEC
int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
#else
#endif

#ifdef CONFIG_KEXEC_CORE
.
..

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

2015-07-14 Thread Vivek Goyal

On Tue, Jul 14, 2015 at 01:59:19PM +, dwal...@fifo99.com wrote:
 On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
  dwal...@fifo99.com writes:
  
   On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
   Hidehiro Kawai hidehiro.kawai...@hitachi.com writes:
   
You can call panic notifiers and kmsg dumpers before kdump by
specifying crash_kexec_post_notifiers as a boot parameter.
However, it doesn't make sense if kdump is not available.  In that
case, disable crash_kexec_post_notifiers boot parameter so that
you can't change the value of the parameter.
   
   Nacked-by: Eric W. Biederman ebied...@xmission.com
  
   I think it would make sense if he just replaced kdump with kexec.
  
  It would be less insane, however it still makes no sense as without
  kexec on panic support crash_kexec is a noop.  So the value of the
  seeting makes no difference.
 
 Can you explain more, I don't really understand what you mean. Are you 
 suggesting
 the whole crash_kexec_post_notifiers feature has no value ?

If CONFIG_KEXEC=n, then crash_kexec() is a nop. So it does not matter
whether crash_kexec() is called before panic notifiers or after.

IOW, what do you gain by disabling crash_kexec_post_notifiers, in 
case of CONFIG_KEXEC=n?

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 3/3] kexec: Change the timing of callbacks related to crash_kexec_post_notifiers boot option

2015-07-14 Thread Vivek Goyal

On Fri, Jul 10, 2015 at 08:33:31PM +0900, Hidehiro Kawai wrote:
 This patch fixes problems reported by Daniel Walker
 (https://lkml.org/lkml/2015/6/24/44), and also replaces the bug fix
 commits 5375b70 and f45d85f.
 
 If crash_kexec_post_notifiers boot option is specified,
 other cpus are stopped by smp_send_stop() before entering
 crash_kexec(), while usually machine_crash_shutdown() called by
 crash_kexec() does that.  This behavior change leads two problems.
 
  Problem 1:
  Some function in the crash_kexec() path depend on other cpus being
  still online.  If other cpus have been offlined already, they
  doesn't work properly.
 
   Example:
panic()
 crash_kexec()
  machine_crash_shutdown()
   octeon_generic_shutdown() // shutdown watchdog for ONLINE cpus
  machine_kexec()
 
  Problem 2:
  Most of architectures stop other cpus in the machine_crash_shutdown()
  path and save register information at the same time.  However, if
  smp_send_stop() is called before that, we can't save the register
  information.
 
 To solve these problems, this patch changes the timing of calling
 the callbacks instead of changing the timing of crash_kexec() if
 crash_kexec_post_notifiers boot option is specified.
 
  Before:
   if (!crash_kexec_post_notifiers)
   crash_kexec()
 
   smp_send_stop()
   atomic_notifier_call_chain()
   kmsg_dump()
 
   if (crash_kexec_post_notifiers)
   crash_kexec()
 
  After:
   crash_kexec()
   machine_crash_shutdown()
   if (crash_kexec_post_notifiers) {
   atomic_notifier_call_chain()
   kmsg_dump()
   }
   machine_kexec()
 
   smp_send_stop()
   if (!crash_kexec_post_notifiers) {
   atomic_notifier_call_chain()
   kmsg_dump()
   }
 

I think this new code flow looks bad. Now we are calling kmsg_dump()
and atomic_notifier_call_chain() from inside the crash_kexec() as well
as from inside panic(). This is bad.

So basic problem seems to be that cpus need to be stopped once (with
or without panic notifiers. So why don't we look into desiginig a 
function which stops cpus, saves register states first and then does
rest of the processing.

Something like.

stop_cpus_save_register_state;

if (!crash_kexec_post_notifiers)
crash_kexec()

atomic_notifier_call_chain()
kmsg_dump()

Here crash_kexec() will have to be modified and it will assume that cpus
have already been stopped and register states have already been saved.

IOW, is there a reason that we can't get rid of smp_send_stop() and
use the mechanism crash_kexec() is using to stop cpus after panic()?

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

2015-07-14 Thread Vivek Goyal

On Tue, Jul 14, 2015 at 03:34:30PM +, dwal...@fifo99.com wrote:
 On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote:
  On Tue, Jul 14, 2015 at 01:59:19PM +, dwal...@fifo99.com wrote:
   On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
dwal...@fifo99.com writes:

 On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
 Hidehiro Kawai hidehiro.kawai...@hitachi.com writes:
 
  You can call panic notifiers and kmsg dumpers before kdump by
  specifying crash_kexec_post_notifiers as a boot parameter.
  However, it doesn't make sense if kdump is not available.  In that
  case, disable crash_kexec_post_notifiers boot parameter so that
  you can't change the value of the parameter.
 
 Nacked-by: Eric W. Biederman ebied...@xmission.com

 I think it would make sense if he just replaced kdump with kexec.

It would be less insane, however it still makes no sense as without
kexec on panic support crash_kexec is a noop.  So the value of the
seeting makes no difference.
   
   Can you explain more, I don't really understand what you mean. Are you 
   suggesting
   the whole crash_kexec_post_notifiers feature has no value ?
  
  Daniel,
  
  BTW, why are you using crash_kexec_post_notifiers commandline? Why not
  without it?
 
 It was explained in the prior thread but to rehash, the notifiers are used to 
 do a switch
 over from the crashed machine to another redundant machine.

So why not detect failure using polling or issue notifications from second
kernel.

IOW, expecting that a crashed machine will be able to deliver notification
reliably is falwed to begin with, IMHO.

If a machine is failing, there are high chance it can't deliver you the
notification. Detecting that failure suing some kind of polling mechanism
might be more reliable. And it will make even kdump mechanism more
reliable so that it does not have to run panic notifiers after the crash.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

2015-07-14 Thread Vivek Goyal

On Tue, Jul 14, 2015 at 03:48:33PM +, dwal...@fifo99.com wrote:
 On Tue, Jul 14, 2015 at 11:40:40AM -0400, Vivek Goyal wrote:
  On Tue, Jul 14, 2015 at 03:34:30PM +, dwal...@fifo99.com wrote:
   On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote:
On Tue, Jul 14, 2015 at 01:59:19PM +, dwal...@fifo99.com wrote:
 On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
  dwal...@fifo99.com writes:
  
   On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
   Hidehiro Kawai hidehiro.kawai...@hitachi.com writes:
   
You can call panic notifiers and kmsg dumpers before kdump by
specifying crash_kexec_post_notifiers as a boot parameter.
However, it doesn't make sense if kdump is not available.  In 
that
case, disable crash_kexec_post_notifiers boot parameter so 
that
you can't change the value of the parameter.
   
   Nacked-by: Eric W. Biederman ebied...@xmission.com
  
   I think it would make sense if he just replaced kdump with 
   kexec.
  
  It would be less insane, however it still makes no sense as without
  kexec on panic support crash_kexec is a noop.  So the value of the
  seeting makes no difference.
 
 Can you explain more, I don't really understand what you mean. Are 
 you suggesting
 the whole crash_kexec_post_notifiers feature has no value ?

Daniel,

BTW, why are you using crash_kexec_post_notifiers commandline? Why not
without it?
   
   It was explained in the prior thread but to rehash, the notifiers are 
   used to do a switch
   over from the crashed machine to another redundant machine.
  
  So why not detect failure using polling or issue notifications from second
  kernel.
  
  IOW, expecting that a crashed machine will be able to deliver notification
  reliably is falwed to begin with, IMHO.
 
 It's flawed to think you can kexec, but you still do it right ? I've not 
 gotten into
 the deep details of this switching process, but that's how this interface is 
 used.

Sure. But the deal here is that users of interface know that sometimes it
can be unreliable. And in the absence of more reliable mechanism, somewhat
less reliable mechanism is fine. 

  
  If a machine is failing, there are high chance it can't deliver you the
  notification. Detecting that failure suing some kind of polling mechanism
  might be more reliable. And it will make even kdump mechanism more
  reliable so that it does not have to run panic notifiers after the crash.
 
 I think what your suggesting is that my company should change how it's 
 hardware works
 and that's not really an option for me. This isn't a simple thing like 
 checking over the
 network if the machine is down or not, this is way more complex hardware 
 design.

That means you are ready to live with an unreliable design. There might be
cases where notifier does not get run properly and you will not do switch
despite the fact that OS has failed. I was just trying to nudge you in
a direction which could be more reliable mechanism.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

2015-07-14 Thread Vivek Goyal

On Tue, Jul 14, 2015 at 01:59:19PM +, dwal...@fifo99.com wrote:
 On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
  dwal...@fifo99.com writes:
  
   On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
   Hidehiro Kawai hidehiro.kawai...@hitachi.com writes:
   
You can call panic notifiers and kmsg dumpers before kdump by
specifying crash_kexec_post_notifiers as a boot parameter.
However, it doesn't make sense if kdump is not available.  In that
case, disable crash_kexec_post_notifiers boot parameter so that
you can't change the value of the parameter.
   
   Nacked-by: Eric W. Biederman ebied...@xmission.com
  
   I think it would make sense if he just replaced kdump with kexec.
  
  It would be less insane, however it still makes no sense as without
  kexec on panic support crash_kexec is a noop.  So the value of the
  seeting makes no difference.
 
 Can you explain more, I don't really understand what you mean. Are you 
 suggesting
 the whole crash_kexec_post_notifiers feature has no value ?

Daniel,

BTW, why are you using crash_kexec_post_notifiers commandline? Why not
without it?

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v4] kexec: Make a pair of map and unmap reserved pages when kdump fails to start

2015-07-14 Thread Vivek Goyal

On Fri, Jul 10, 2015 at 11:14:06AM +0200, Michael Holzheu wrote:

[..]
 What about the following patch:
 ---
 diff --git a/kernel/kexec.c b/kernel/kexec.c
 index 7a36fdc..7837c4e 100644
 --- a/kernel/kexec.c
 +++ b/kernel/kexec.c
 @@ -1236,10 +1236,68 @@ int kexec_load_disabled;
  
  static DEFINE_MUTEX(kexec_mutex);
  
 +static int __kexec_load(unsigned long entry, unsigned long nr_segments,

How about renaming the function to do_kexec_load()?

We also need to cleanup the description of commit. One needs to explain
problem better and what's the solution this patch is implemeting.

 + struct kexec_segment __user *segments,
 + unsigned long flags)
 +{
 + struct kimage **dest_image, *image;
 + unsigned long i;
 + int result;
 +
 + if (flags  KEXEC_ON_CRASH)
 + dest_image = kexec_crash_image;
 + else
 + dest_image = kexec_image;
 +
 + if (nr_segments == 0) {
 + /* Uninstall image */
 + kfree(xchg(dest_image, NULL));

kimage_free(), as you have already noted in a follow up mail.

 + return 0;
 + }
 + if (flags  KEXEC_ON_CRASH) {
 + /*
 +  * Loading another kernel to switch to if this one
 +  * crashes.  Free any current crash dump kernel before
 +  * we corrupt it.
 +  */
 + kimage_free(xchg(kexec_crash_image, NULL));
 + }
 +
 + result = kimage_alloc_init(image, entry, nr_segments, segments, flags);
 + if (result)
 + return result;
 +
 + if (flags  KEXEC_ON_CRASH)
 + crash_map_reserved_pages();
 +
 + if (flags  KEXEC_PRESERVE_CONTEXT)
 + image-preserve_context = 1;
 +
 + result = machine_kexec_prepare(image);
 + if (result)
 + goto failure_unmap_mem;
 +
 + for (i = 0; i  nr_segments; i++) {
 + result = kimage_load_segment(image, image-segment[i]);
 + if (result)
 + goto failure_unmap_mem;
 + }
 +
 + kimage_terminate(image);
 +
 + /* Install the new kernel and uninstall the old */
 + image = xchg(dest_image, image);
 +
 +failure_unmap_mem:

I don't like this tag failure_unmap_mem. We are calling this both
in success path as well as failure path. So why not simply call it out.

 + if (flags  KEXEC_ON_CRASH)
 + crash_unmap_reserved_pages();
 + kimage_free(image);

Now kimage_free() is called with kexec_mutex held. Previously that was
not the case. I hope that's not a problem.

 + return result;
 +}
 +

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

2015-07-14 Thread Vivek Goyal

On Tue, Jul 14, 2015 at 01:01:12PM -0500, Eric W. Biederman wrote:
 Vivek Goyal vgo...@redhat.com writes:
 
  On Tue, Jul 14, 2015 at 05:29:53PM +, dwal...@fifo99.com wrote:
 
  [..]
 If a machine is failing, there are high chance it can't deliver you 
 the
 notification. Detecting that failure suing some kind of polling 
 mechanism
 might be more reliable. And it will make even kdump mechanism more
 reliable so that it does not have to run panic notifiers after the 
 crash.

I think what your suggesting is that my company should change how 
it's hardware works
and that's not really an option for me. This isn't a simple thing 
like checking over the
network if the machine is down or not, this is way more complex 
hardware design.
   
That means you are ready to live with an unreliable design. There 
might be
cases where notifier does not get run properly and you will not do 
switch
despite the fact that OS has failed. I was just trying to nudge you in
a direction which could be more reliable mechanism.
   
   Sigh I see some deep confusion going on here.
   
   The panic notifiers are just that panic notifiers.  They have not been
   nor should they be tied to kexec.   If those notifiers force a switch
   over of between machines I fail to see why you would care if it was
   kexec or another panic situation that is forcing that switchover.
  
  Hidehiro isn't fixing the failover situation on my side, he's fixing 
  register
  information collection when crash_kexec_post_notifiers is used.
 
  Sure. Given that we have created this new parameter, let us fix it so that
  we can capture the other cpu register state in crash dump.
 
  I am little disappointed that it was not tested well when this parameter was
  introuced. We should have atleast tested it to the extent to see if there
  is proper cpu state present for all cpus in the crash dump.
 
  At that point of time it looked like a simple modification
  to allow panic notifiers before crash_kexec().
 
 Either that or we say no one cares enough, and it known broken so let's
 just revert the fool thing.

Masami, you introduced this option. Are you fine with the revert? Is it
really being used and tested?

 I honestly can't see how to support panic notifiers, before kexec.
 There is no way to tell what is being done and all of the pieces
 including smp_send_stop are known to be buggy.

we should be able to replace smp_send_stop() with what crash_kexec() is
doing to stop the machine? If yes, then it should be fine I guess. This
parameter description clearly says that specify it at your own risk. So
we are not issuing a big support statement for successful kdump after
panic notifiers. If it is something fixable, otherwise user needs
to deal with it.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

2015-07-14 Thread Vivek Goyal

On Tue, Jul 14, 2015 at 05:29:53PM +, dwal...@fifo99.com wrote:

[..]
If a machine is failing, there are high chance it can't deliver you the
notification. Detecting that failure suing some kind of polling 
mechanism
might be more reliable. And it will make even kdump mechanism more
reliable so that it does not have to run panic notifiers after the 
crash.
   
   I think what your suggesting is that my company should change how it's 
   hardware works
   and that's not really an option for me. This isn't a simple thing like 
   checking over the
   network if the machine is down or not, this is way more complex hardware 
   design.
  
   That means you are ready to live with an unreliable design. There might be
   cases where notifier does not get run properly and you will not do switch
   despite the fact that OS has failed. I was just trying to nudge you in
   a direction which could be more reliable mechanism.
  
  Sigh I see some deep confusion going on here.
  
  The panic notifiers are just that panic notifiers.  They have not been
  nor should they be tied to kexec.   If those notifiers force a switch
  over of between machines I fail to see why you would care if it was
  kexec or another panic situation that is forcing that switchover.
 
 Hidehiro isn't fixing the failover situation on my side, he's fixing register
 information collection when crash_kexec_post_notifiers is used.

Sure. Given that we have created this new parameter, let us fix it so that
we can capture the other cpu register state in crash dump.

I am little disappointed that it was not tested well when this parameter was
introuced. We should have atleast tested it to the extent to see if there
is proper cpu state present for all cpus in the crash dump.

At that point of time it looked like a simple modification
to allow panic notifiers before crash_kexec().

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v3] kexec: Make a pair of map and unmap reserved pages when kdump fails to start

2015-07-07 Thread Vivek Goyal

On Thu, Jul 02, 2015 at 09:45:52AM +0800, Minfei Huang wrote:
 For some arch, kexec shall map the reserved pages, then use them, when
 we try to start the kdump service.
 
 Now kexec will never unmap the reserved pages, once it fails to continue
 starting the kdump service.
 
 Make a pair of reserved pages in kdump starting path, whatever kexec
 fails or not.
 
 Signed-off-by: Minfei Huang mnfhu...@gmail.com
 ---
 v2:
 - replace the failure label with fail_unmap_pages
 v1:
 - reconstruct the patch code
 ---
  kernel/kexec.c | 26 ++
  1 file changed, 14 insertions(+), 12 deletions(-)
 

Hi Minfei,

I am thinking of moving kernel loading code in a separate function to
make things little simpler. Right now it is confusing.

Can you please test attached patch. I have only compile tested it. This
is primarily doing what you are doing but in a separate function. It
seems more readable now.

Thanks
Vivek


---
 kernel/kexec.c |   90 +++--
 1 file changed, 56 insertions(+), 34 deletions(-)

Index: rhvgoyal-linux/kernel/kexec.c
===
--- rhvgoyal-linux.orig/kernel/kexec.c  2015-07-06 13:59:35.088129148 -0400
+++ rhvgoyal-linux/kernel/kexec.c   2015-07-07 17:14:23.593175644 -0400
@@ -1247,6 +1247,57 @@ int kexec_load_disabled;
 
 static DEFINE_MUTEX(kexec_mutex);
 
+static int __kexec_load(struct kimage **rimage, unsigned long entry,
+   unsigned long nr_segments,
+   struct kexec_segment __user * segments,
+   unsigned long flags)
+{
+   unsigned long i;
+   int result;
+   struct kimage *image;
+
+   if (flags  KEXEC_ON_CRASH) {
+   /*
+* Loading another kernel to switch to if this one
+* crashes.  Free any current crash dump kernel before
+* we corrupt it.
+*/
+
+   kimage_free(xchg(kexec_crash_image, NULL));
+   }
+
+   result = kimage_alloc_init(image, entry, nr_segments, segments, flags);
+   if (result)
+   return result;
+
+   if (flags  KEXEC_ON_CRASH)
+   crash_map_reserved_pages();
+
+   if (flags  KEXEC_PRESERVE_CONTEXT)
+   image-preserve_context = 1;
+
+   result = machine_kexec_prepare(image);
+   if (result)
+   goto out;
+
+   for (i = 0; i  nr_segments; i++) {
+   result = kimage_load_segment(image, image-segment[i]);
+   if (result)
+   goto out;
+   }
+
+   kimage_terminate(image);
+   *rimage = image;
+out:
+   if (flags  KEXEC_ON_CRASH)
+   crash_unmap_reserved_pages();
+
+   /* Free image if there was an error */
+   if (result)
+   kimage_free(image);
+   return result;
+}
+
 SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
struct kexec_segment __user *, segments, unsigned long, flags)
 {
@@ -1292,44 +1343,15 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
dest_image = kexec_image;
if (flags  KEXEC_ON_CRASH)
dest_image = kexec_crash_image;
-   if (nr_segments  0) {
-   unsigned long i;
 
-   if (flags  KEXEC_ON_CRASH) {
-   /*
-* Loading another kernel to switch to if this one
-* crashes.  Free any current crash dump kernel before
-* we corrupt it.
-*/
-
-   kimage_free(xchg(kexec_crash_image, NULL));
-   result = kimage_alloc_init(image, entry, nr_segments,
-  segments, flags);
-   crash_map_reserved_pages();
-   } else {
-   /* Loading another kernel to reboot into. */
-
-   result = kimage_alloc_init(image, entry, nr_segments,
-  segments, flags);
-   }
-   if (result)
-   goto out;
-
-   if (flags  KEXEC_PRESERVE_CONTEXT)
-   image-preserve_context = 1;
-   result = machine_kexec_prepare(image);
+   /* Load new kernel */
+   if (nr_segments  0) {
+   result = __kexec_load(image, entry, nr_segments, segments,
+ flags);
if (result)
goto out;
-
-   for (i = 0; i  nr_segments; i++) {
-   result = kimage_load_segment(image, image-segment[i]);
-   if (result)
-   goto out;
-   }
-   kimage_terminate(image);
-   if (flags  KEXEC_ON_CRASH)
-   crash_unmap_reserved_pages();
}
+

Re: kexec_load(2) bypasses signature verification

2015-06-25 Thread Vivek Goyal

On Thu, Jun 25, 2015 at 04:48:18PM +0800, Dave Young wrote:
 On 06/19/15 at 09:09am, Vivek Goyal wrote:
  On Fri, Jun 19, 2015 at 04:18:16PM +0800, Dave Young wrote:
 If we want to disable unsigned kernel loading at compile time, then we
 really need to work on decoupling CONFIG_KEXEC and CONFIG_FILE_KEXEC.
 Introducing another config option is not the way forward, IMHO.

Yes, let's do it in this way since everyone is fine with it.
   
   I will work on a patch if nobody else have interest or no time on it.
  
  Thanks Dave. Will be good if you can get this done.
 
 Vivek, I worked out some draft patches here:
 https://github.com/daveyoung/linux/commits/kexec-syscall-cleanup
 
 Basiclly I split kexec_file first, then add CONFIG_KEXEC_CORE kconfig option
 then split kexec_load code from general code.
 
 There's a lot of #ifdef CONFIG_KEXEC in kernel source, because CONFIG_KEXEC
 can be disabled so I changed all kernel general and x86 #ifdef to use
 CONFIG_KEXEC_CORE if necessary. For other arches dependent code with #ifdef
 I did not change anything other than the new Kconfig option. It will works
 because only x86 support KEXEC_FILE.
 
 Please take a look if you have time, if this is not what you want please let
 me know.
 
 I will have no time this week, only did building test, will do more test next
 week, if everything is ok I can send out the patches to list for review.

Hi Dave,

I have put few comments in github. Please have a look. Once you have
another version of patches, I will have another look.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: kexec_load(2) bypasses signature verification

2015-06-19 Thread Vivek Goyal

On Fri, Jun 19, 2015 at 04:18:16PM +0800, Dave Young wrote:
   If we want to disable unsigned kernel loading at compile time, then we
   really need to work on decoupling CONFIG_KEXEC and CONFIG_FILE_KEXEC.
   Introducing another config option is not the way forward, IMHO.
  
  Yes, let's do it in this way since everyone is fine with it.
 
 I will work on a patch if nobody else have interest or no time on it.

Thanks Dave. Will be good if you can get this done.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: kexec_load(2) bypasses signature verification

2015-06-19 Thread Vivek Goyal

On Fri, Jun 19, 2015 at 03:04:31PM +0800, Dave Young wrote:
 On 06/16/15 at 09:47pm, Vivek Goyal wrote:
  On Tue, Jun 16, 2015 at 08:32:37PM -0500, Eric W. Biederman wrote:
   Vivek Goyal vgo...@redhat.com writes:
   
On Tue, Jun 16, 2015 at 02:38:31PM -0500, Eric W. Biederman wrote:

Adding Vivek as he is the one who implemented kexec_file_load.
I was hoping he would respond to this thread, and it looks like he
simply has not ever been Cc'd.

Theodore Ts'o ty...@mit.edu writes:

 On Mon, Jun 15, 2015 at 09:37:05AM -0400, Josh Boyer wrote:
 The bits that actually read Secure Boot state out of the UEFI
 variables, and apply protections to the machine to avoid compromise
 under the SB threat model.  Things like disabling the old kexec...

 I don't have any real interest in using Secure Boot, but I *am*
 interested in using CONFIG_KEXEC_VERIFY_SIG[1].  So perhaps we need 
 to
 have something similar to what we have with signed modules in terms 
 of
 CONFIG_MODULE_SIG_FORCE and module/sig_enforce, but for
 KEXEC_VERIFY_SIG.  This would mean creating a separate flag
 independent of the one Linus suggested for Secure Boot, but since we
 have one for signed modules, we do have precedent for this sort of
 thing.

My overall request with respect to kexec has been that we implement
things that make sense outside of the bizarre threat model of the Linux
folks who were talking about secure boot.

nI have not navigated the labyrinth of config options but having a way 
to
only boot signed things with kexec seems a completely sensible way to
operate in the context of signed images.

I don't know how much that will help given that actors with sufficient
resources have demonstrated the ability to steal private keys, but
assuming binary signing is an effective technique (or why else do it)
then having an option to limit kexec to only loading signed images 
seems
sensible.
   
I went through the mail chain on web and here are my thoughts.
   
- So yes, upstream does not have the logic which automatically disables
  the old syscall (kexec_load()) on secureboot systems. Distributions
  carry those patches.
   
- This KEXEC_VERIFY_SIG option only cotrols the behavior for
  kexec_file_load() syscall and is not meant to directly affect any
  behavior of old syscall (kexec_load()). I think I should have named
  it KEXEC_FILE_VERIFY_SIG. Though help text makes it clear.
  Verify kernel signature during kexec_file_load() syscall.
   
- I think disabling old system call if KEXEC_VERIFY_SIG() is set
  will break existing setup which use old system call by default, except
  the case of secureboot system. And old syscall path is well tested
  and new syscall might not be in a position to support all the corner
  cases, atleast as of now.
   
Ted, 
   
So looks like you are looking for a system/option where you just want to
always make use of kexec_file_load() and disable kexec_load(). This 
sounds
like you want a kernel where kexec_load() is compiled out and you want
only kexec_file_load() in.
   
Right now one can't do that becase kexec_file_load() depends on
CONFIG_KEXEC option.
   
I am wondering that how about making CONFIG_KEXEC_FILE_LOAD independent
of CONFIG_KEXEC. That way one can set CONFIG_KEXEC_VERIFY_SIG=y, and
only signed kernel can be kexeced on that system.
   
This should gel well with long term strategy of deprecating kexec_load()
at some point of time when kexec_file_load() is ready to completely
replace it.
   
   Interesting.
   
   I suspect that what we want is to have CONFIG_KEXEC for the core
   and additional CONFIG_KEXEC_LOAD option that covers that kexec_load call.
   
   That should make it trivially easy to disable the kexec_load system call
   in cases where people care.
  
  Or, we could create another option CONFIG_KEXEC_CORE/CONFIG_KEXEC_COMMON
  which will be automatically selected when either CONFIG_KEXEC or
  CONIG_KEXEC_FILE are selected.
  
  All common code can go under this option and rest can go under respective
  config options.
  
  That way, those who have CONFIG_KEXEC=y in old config files will not be
  broken. They don't have to learn about new options at all.
 
 Vivek, It is slight better for reusing old config file, but CONFIG_KEXEC_LOAD
 sounds better. Do we have to maintain the compability for kconfig?
 
 KEXEC_COMMON/KEXEC/KEXEC_FILE_LOAD is a little confusing. CONFIG_KEXEC
 should be the common kexec stuff naturally, it is strange to use CONFIG_KEXEC
 for only kexec_load syscall.

Hi Dave,

I think as a user I would like my old config file to work with new kernel.
It is a good idea to keep old config options until and unless we have a
very good reason.

To me following should be reasonable.

CONFIG_KEXEC -- Enable old

Re: kexec_load(2) bypasses signature verification

2015-06-18 Thread Vivek Goyal

On Thu, Jun 18, 2015 at 10:02:09AM +0800, Dave Young wrote:

[..]
  Or simply add a new config option KEXEC_VERIFY_SIG_FORCE, so we can return
  error in kexec_load and print some error message.
 
 Just like below, does this work for you, Ted?
 
 ---
  arch/x86/Kconfig |7 +++
  kernel/kexec.c   |9 -
  2 files changed, 15 insertions(+), 1 deletion(-)
 
 --- linux.orig/arch/x86/Kconfig
 +++ linux/arch/x86/Kconfig
 @@ -1755,6 +1755,13 @@ config KEXEC_VERIFY_SIG
 verification for the corresponding kernel image type being
 loaded in order for this to work.
  
 +config KEXEC_VERIFY_SIG_FORCE
 + bool Enforce kexec signature verifying
 + depends on KEXEC_VERIFY_SIG
 + ---help---
 +   This option disable kexec_load() syscall, only kexec_file_load
 +   can be used.
 +


Hi Dave,

I think we might not need a new config option. A new config option makes
it little confusing. KEXEC_VERIFY_SIG already implies KEXEC_VERIFY_SIG_FORCE
(for new syscall). Now extending it to also mean that it should disable old
syscall is confusing.

We already have a sysctl knob to disable kexec kernel loading. But that
knob disables it on both the syscalls.

May be we can just introduce another command line option say
kexec_verify_sig_force and this will work across both the syscalls and
will deny loading a unsigned kernel in following two cases.

- Using old syscall
- Using new syscall if kernel was compiled with KEXEC_VERIFY_SIG=n.

This should be simple and get us going in short term.

If we want to disable unsigned kernel loading at compile time, then we
really need to work on decoupling CONFIG_KEXEC and CONFIG_FILE_KEXEC.
Introducing another config option is not the way forward, IMHO.

Thanks
Vivek


  config KEXEC_BZIMAGE_VERIFY_SIG
   bool Enable bzImage signature verification support
   depends on KEXEC_VERIFY_SIG
 --- linux.orig/kernel/kexec.c
 +++ linux/kernel/kexec.c
 @@ -45,6 +45,12 @@
  #include crypto/hash.h
  #include crypto/sha.h
  
 +#ifdef CONFIG_KEXEC_VERIFY_SIG_FORCE
 +static bool kexec_verify_sig_force = true;
 +#else
 +static bool kexec_verify_sig_force;
 +#endif
 +
  /* Per cpu memory for storing cpu states in case of system crash. */
  note_buf_t __percpu *crash_notes;
  
 @@ -1243,7 +1249,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
   int result;
  
   /* We only trust the superuser with rebooting the system. */
 - if (!capable(CAP_SYS_BOOT) || kexec_load_disabled)
 + if (!capable(CAP_SYS_BOOT) || kexec_load_disabled
 + || kexec_verify_sig_force)
   return -EPERM;
  
   /*

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: kexec_load(2) bypasses signature verification

2015-06-16 Thread Vivek Goyal

On Tue, Jun 16, 2015 at 02:38:31PM -0500, Eric W. Biederman wrote:
 
 Adding Vivek as he is the one who implemented kexec_file_load.
 I was hoping he would respond to this thread, and it looks like he
 simply has not ever been Cc'd.
 
 Theodore Ts'o ty...@mit.edu writes:
 
  On Mon, Jun 15, 2015 at 09:37:05AM -0400, Josh Boyer wrote:
  The bits that actually read Secure Boot state out of the UEFI
  variables, and apply protections to the machine to avoid compromise
  under the SB threat model.  Things like disabling the old kexec...
 
  I don't have any real interest in using Secure Boot, but I *am*
  interested in using CONFIG_KEXEC_VERIFY_SIG[1].  So perhaps we need to
  have something similar to what we have with signed modules in terms of
  CONFIG_MODULE_SIG_FORCE and module/sig_enforce, but for
  KEXEC_VERIFY_SIG.  This would mean creating a separate flag
  independent of the one Linus suggested for Secure Boot, but since we
  have one for signed modules, we do have precedent for this sort of
  thing.
 
 My overall request with respect to kexec has been that we implement
 things that make sense outside of the bizarre threat model of the Linux
 folks who were talking about secure boot.
 
 nI have not navigated the labyrinth of config options but having a way to
 only boot signed things with kexec seems a completely sensible way to
 operate in the context of signed images.
 
 I don't know how much that will help given that actors with sufficient
 resources have demonstrated the ability to steal private keys, but
 assuming binary signing is an effective technique (or why else do it)
 then having an option to limit kexec to only loading signed images seems
 sensible.

I went through the mail chain on web and here are my thoughts.

- So yes, upstream does not have the logic which automatically disables
  the old syscall (kexec_load()) on secureboot systems. Distributions
  carry those patches.

- This KEXEC_VERIFY_SIG option only cotrols the behavior for
  kexec_file_load() syscall and is not meant to directly affect any
  behavior of old syscall (kexec_load()). I think I should have named
  it KEXEC_FILE_VERIFY_SIG. Though help text makes it clear.
  Verify kernel signature during kexec_file_load() syscall.

- I think disabling old system call if KEXEC_VERIFY_SIG() is set
  will break existing setup which use old system call by default, except
  the case of secureboot system. And old syscall path is well tested
  and new syscall might not be in a position to support all the corner
  cases, atleast as of now.

Ted, 

So looks like you are looking for a system/option where you just want to
always make use of kexec_file_load() and disable kexec_load(). This sounds
like you want a kernel where kexec_load() is compiled out and you want
only kexec_file_load() in.

Right now one can't do that becase kexec_file_load() depends on
CONFIG_KEXEC option.

I am wondering that how about making CONFIG_KEXEC_FILE_LOAD independent
of CONFIG_KEXEC. That way one can set CONFIG_KEXEC_VERIFY_SIG=y, and
only signed kernel can be kexeced on that system.

This should gel well with long term strategy of deprecating kexec_load()
at some point of time when kexec_file_load() is ready to completely
replace it.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: kexec_load(2) bypasses signature verification

2015-06-16 Thread Vivek Goyal

On Tue, Jun 16, 2015 at 08:32:37PM -0500, Eric W. Biederman wrote:
 Vivek Goyal vgo...@redhat.com writes:
 
  On Tue, Jun 16, 2015 at 02:38:31PM -0500, Eric W. Biederman wrote:
  
  Adding Vivek as he is the one who implemented kexec_file_load.
  I was hoping he would respond to this thread, and it looks like he
  simply has not ever been Cc'd.
  
  Theodore Ts'o ty...@mit.edu writes:
  
   On Mon, Jun 15, 2015 at 09:37:05AM -0400, Josh Boyer wrote:
   The bits that actually read Secure Boot state out of the UEFI
   variables, and apply protections to the machine to avoid compromise
   under the SB threat model.  Things like disabling the old kexec...
  
   I don't have any real interest in using Secure Boot, but I *am*
   interested in using CONFIG_KEXEC_VERIFY_SIG[1].  So perhaps we need to
   have something similar to what we have with signed modules in terms of
   CONFIG_MODULE_SIG_FORCE and module/sig_enforce, but for
   KEXEC_VERIFY_SIG.  This would mean creating a separate flag
   independent of the one Linus suggested for Secure Boot, but since we
   have one for signed modules, we do have precedent for this sort of
   thing.
  
  My overall request with respect to kexec has been that we implement
  things that make sense outside of the bizarre threat model of the Linux
  folks who were talking about secure boot.
  
  nI have not navigated the labyrinth of config options but having a way to
  only boot signed things with kexec seems a completely sensible way to
  operate in the context of signed images.
  
  I don't know how much that will help given that actors with sufficient
  resources have demonstrated the ability to steal private keys, but
  assuming binary signing is an effective technique (or why else do it)
  then having an option to limit kexec to only loading signed images seems
  sensible.
 
  I went through the mail chain on web and here are my thoughts.
 
  - So yes, upstream does not have the logic which automatically disables
the old syscall (kexec_load()) on secureboot systems. Distributions
carry those patches.
 
  - This KEXEC_VERIFY_SIG option only cotrols the behavior for
kexec_file_load() syscall and is not meant to directly affect any
behavior of old syscall (kexec_load()). I think I should have named
it KEXEC_FILE_VERIFY_SIG. Though help text makes it clear.
Verify kernel signature during kexec_file_load() syscall.
 
  - I think disabling old system call if KEXEC_VERIFY_SIG() is set
will break existing setup which use old system call by default, except
the case of secureboot system. And old syscall path is well tested
and new syscall might not be in a position to support all the corner
cases, atleast as of now.
 
  Ted, 
 
  So looks like you are looking for a system/option where you just want to
  always make use of kexec_file_load() and disable kexec_load(). This sounds
  like you want a kernel where kexec_load() is compiled out and you want
  only kexec_file_load() in.
 
  Right now one can't do that becase kexec_file_load() depends on
  CONFIG_KEXEC option.
 
  I am wondering that how about making CONFIG_KEXEC_FILE_LOAD independent
  of CONFIG_KEXEC. That way one can set CONFIG_KEXEC_VERIFY_SIG=y, and
  only signed kernel can be kexeced on that system.
 
  This should gel well with long term strategy of deprecating kexec_load()
  at some point of time when kexec_file_load() is ready to completely
  replace it.
 
 Interesting.
 
 I suspect that what we want is to have CONFIG_KEXEC for the core
 and additional CONFIG_KEXEC_LOAD option that covers that kexec_load call.
 
 That should make it trivially easy to disable the kexec_load system call
 in cases where people care.

Or, we could create another option CONFIG_KEXEC_CORE/CONFIG_KEXEC_COMMON
which will be automatically selected when either CONFIG_KEXEC or
CONIG_KEXEC_FILE are selected.

All common code can go under this option and rest can go under respective
config options.

That way, those who have CONFIG_KEXEC=y in old config files will not be
broken. They don't have to learn about new options at all.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] prepend elfcorehdr instead of appending it to the crash-kernel command-line.

2015-05-14 Thread Vivek Goyal

On Wed, May 13, 2015 at 12:05:54PM +0200, KarimAllah Ahmed wrote:
 Any parameter passed after '--' in the kernel command-line will not be parsed
 by the kernel at all, instead it will be passed directly to init process.
 
 Currently the kernel appends elfcorehdr=paddr to the cmdline passed from 
 kexec
 load, and if this command-line is used to pass parameters to init process this
 means that 'elfcorehdr' will not be parsed as a kernel parameter at all which
 will be a problem for vmcore subsystem since it will know nothing about the
 location of the ELF structure!
 
 Prepending 'elfcorehdr' instead of appending it fixes this problem since it
 ensures that it always comes before '--' and so it's always parsed as a kernel
 command-line parameter.
 
 Even with this patch things can still go wrong if 'CONFIG_CMDLINE' was also 
 used
 to embedd a command-line to the crash dump kernel and this command-line 
 contains
 '--' since the current behavior of the kernel is to actually append the boot
 loader command-line to the embedded command-line.
 
 Signed-off-by: KarimAllah Ahmed karah...@amazon.de

Looks good to me. 

We might require a similar change in kexec-tools for old systemcall?

Acked-by: Vivek Goyal vgo...@redhat.com

Thanks
Vivek

 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Ingo Molnar mi...@redhat.com
 Cc: H. Peter Anvin h...@zytor.com
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Vivek Goyal vgo...@redhat.com
 Cc: Haren Myneni hb...@us.ibm.com
 Cc: Eric Biederman ebied...@xmission.com
 ---
  arch/x86/kernel/kexec-bzimage64.c |   11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)
 
 diff --git a/arch/x86/kernel/kexec-bzimage64.c 
 b/arch/x86/kernel/kexec-bzimage64.c
 index ca05f86..ca83f7ac 100644
 --- a/arch/x86/kernel/kexec-bzimage64.c
 +++ b/arch/x86/kernel/kexec-bzimage64.c
 @@ -72,15 +72,16 @@ static int setup_cmdline(struct kimage *image, struct 
 boot_params *params,
unsigned long cmdline_len)
  {
   char *cmdline_ptr = ((char *)params) + cmdline_offset;
 - unsigned long cmdline_ptr_phys, len;
 + unsigned long cmdline_ptr_phys, len = 0;
   uint32_t cmdline_low_32, cmdline_ext_32;
  
 - memcpy(cmdline_ptr, cmdline, cmdline_len);
   if (image-type == KEXEC_TYPE_CRASH) {
 - len = sprintf(cmdline_ptr + cmdline_len - 1,
 -  elfcorehdr=0x%lx, image-arch.elf_load_addr);
 - cmdline_len += len;
 + len = sprintf(cmdline_ptr,
 + elfcorehdr=0x%lx , image-arch.elf_load_addr);
   }
 + memcpy(cmdline_ptr + len, cmdline, cmdline_len);
 + cmdline_len += len;
 +
   cmdline_ptr[cmdline_len - 1] = '\0';
  
   pr_debug(Final command line is: %s\n, cmdline_ptr);
 -- 
 1.7.9.5

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: Re: [PATCH v2] kernel/panic/kexec: fix crash_kexec_post_notifiers option issue in oops path

2015-03-24 Thread Vivek Goyal

On Tue, Mar 24, 2015 at 08:11:29AM +0100, Ingo Molnar wrote:
 
 * Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote:
 
  (2015/03/23 16:19), Ingo Molnar wrote:
   
   * Baoquan He b...@redhat.com wrote:
   
   CC more people ...
  
   On 03/07/15 at 01:31am, Hatayama, Daisuke/畑山 大輔 wrote:
   The commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 introduced
   crash_kexec_post_notifiers kernel boot option, which toggles
   wheather panic() calls crash_kexec() before panic_notifiers and dump
   kmsg or after.
  
   The problem is that the commit overlooks panic_on_oops kernel boot
   option. If it is enabled, crash_kexec() is called directly without
   going through panic() in oops path.
  
   To fix this issue, this patch adds a check to
   crash_kexec_post_notifiers in the condition of kexec_should_crash().
  
   Also, put a comment in kexec_should_crash() to explain not obvious
   things on this patch.
  
   Signed-off-by: HATAYAMA Daisuke d.hatay...@jp.fujitsu.com
   Acked-by: Baoquan He b...@redhat.com
   Tested-by: Hidehiro Kawai hidehiro.kawai...@hitachi.com
   Reviewed-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com
   ---
include/linux/kernel.h |  3 +++
kernel/kexec.c | 11 +++
kernel/panic.c |  2 +-
3 files changed, 15 insertions(+), 1 deletion(-)
   
   This is hack upon hack, but why was this crap merged in the first 
   place?
   
   I see two problems just by cursory review:
   
   1)
   
   Firstly, the real bug in:
   
 f06e5153f4ae (kernel/panic.c: add crash_kexec_post_notifiers option 
   for kdump after panic_notifers)
   
   Was that crash_kexec() was called unconditionally after notifiers were 
   called, which should be fixed via the simple patch below (untested). 
   Looks much simpler than your fix.
  
  No, Daisuke's patch is not for that case. [...]
 
 Yet the actual bug is in that commit, 'crash_kexec_post_notifiers' was 
 clearly not a no-op in the default case, against expectations.

Hi Ingo,

I did a quick test and in default case crash_kexec() runs before panic
notifiers. So it does look like crash_kexec_post_notifiers is a no-op
in default case.

What am I missing.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: Re: [PATCH v2] kernel/panic/kexec: fix crash_kexec_post_notifiers option issue in oops path

2015-03-24 Thread Vivek Goyal

On Tue, Mar 24, 2015 at 05:18:14PM +0100, Ingo Molnar wrote:
 
 * Vivek Goyal vgo...@redhat.com wrote:
 
   Yet the actual bug is in that commit, 'crash_kexec_post_notifiers' 
   was clearly not a no-op in the default case, against expectations.
  
  Hi Ingo,
  
  I did a quick test and in default case crash_kexec() runs before 
  panic notifiers. So it does look like crash_kexec_post_notifiers is 
  a no-op in default case.
  
  What am I missing.
 
 Well, look at f06e5153f4ae:
 
 diff --git a/kernel/panic.c b/kernel/panic.c
 index d02fa9fef46a..62e16cef9cc2 100644
 --- a/kernel/panic.c
 +++ b/kernel/panic.c
 @@ -32,6 +32,7 @@ static unsigned long tainted_mask;
  static int pause_on_oops;
  static int pause_on_oops_flag;
  static DEFINE_SPINLOCK(pause_on_oops_lock);
 +static bool crash_kexec_post_notifiers;
  
  int panic_timeout = CONFIG_PANIC_TIMEOUT;
  EXPORT_SYMBOL_GPL(panic_timeout);
 @@ -112,9 +113,11 @@ void panic(const char *fmt, ...)
   /*
* If we have crashed and we have a crash kernel loaded let it handle
* everything else.
 -  * Do we want to call this before we try to display a message?
 +  * If we want to run this after calling panic_notifiers, pass
 +  * the crash_kexec_post_notifiers option to the kernel.
*/
 - crash_kexec(NULL);
 + if (!crash_kexec_post_notifiers)
 + crash_kexec(NULL);
  
   /*
* Note smp_send_stop is the usual smp shutdown function, which
 @@ -131,6 +134,15 @@ void panic(const char *fmt, ...)
  
   kmsg_dump(KMSG_DUMP_PANIC);
  
 + /*
 +  * If you doubt kdump always works fine in any situation,
 +  * crash_kexec_post_notifiers offers you a chance to run
 +  * panic_notifiers and dumping kmsg before kdump.
 +  * Note: since some panic_notifiers can make crashed kernel
 +  * more unstable, it can increase risks of the kdump failure too.
 +  */
 + crash_kexec(NULL);
 +
   bust_spinlocks(0);
  
   if (!panic_blink)
 
 
 Without knowing what crash_kexec() does, the patch looks buggy: it 
 should preserve the old behavior by default, yet it will now execute a 
 second crash_kexec() after the kmsg_dump() line.
 
 So the invariant change would have been to do:
 
 - crash_kexec(NULL);
 + if (!crash_kexec_post_notifiers)
 + crash_kexec(NULL);
 
 ...
 
 + if (crash_kexec_post_notifiers)
 + crash_kexec(NULL);
 
 Which in the !crash_kexec_post_notifiers flag case reduces to:
 
   crash_kexec();
 
   ...
 
   /* NOP */
 
 I.e. to exactly what the kernel was doing without the patch 
 originally.
 
 Which is what my patch does. Nothing more, nothing less.

Ok, I got it what you mean.

crash_kexec() does not return if a kdump kernel is loaded. If kdump
kernel is not loaded, then crash_kexec() returns without doing anything.

I think that explains why not making second call to crash_kexec() under
if, did not create problems. In first case it will never be called and
in second case, it will do nothing and simply return back.

But anyway, we need your patch as that's right thing to do.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v2] kernel/panic/kexec: fix crash_kexec_post_notifiers option issue in oops path

2015-03-24 Thread Vivek Goyal

On Tue, Mar 24, 2015 at 05:27:10AM -0500, Eric W. Biederman wrote:
 Ingo Molnar mi...@kernel.org writes:
 
  * Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote:
 
   
 f06e5153f4ae (kernel/panic.c: add crash_kexec_post_notifiers option 
   for kdump after panic_notifers)
   
   Was that crash_kexec() was called unconditionally after notifiers were 
   called, which should be fixed via the simple patch below (untested). 
   Looks much simpler than your fix.
  
  No, Daisuke's patch is not for that case. [...]
 
  Yet the actual bug is in that commit, 'crash_kexec_post_notifiers' was 
  clearly not a no-op in the default case, against expectations.
 
  So the first step should be to restore the original behavior (my 
  patch), then should any new tweaks be added.
 
 Honestly I think the proper fix is to simply revert f06e5153f4ae.
 
 It was clearly not properly tested by the people who wanted it because
 they came back quite a while later with additional bleh.
 
 I think this pretty much counts as hitting the code doesn't work let's
 remove it threshold.

IMHO, we should give users flexibility of running panic notifiers before
crash_kexec(). Different people have been asking for it since last 7-8
years and it is a pretty small code in kernel so no major maintenance
headache. 

Agreed that this might be very unreliable, but if users want to shoot
themseleves in the foot, it is their choice. This will not be upstream
default and I am hoping that distributions don't make it their default
either.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v2] kernel/panic/kexec: fix crash_kexec_post_notifiers option issue in oops path

2015-03-23 Thread Vivek Goyal

On Mon, Mar 23, 2015 at 08:19:43AM +0100, Ingo Molnar wrote:
 
 * Baoquan He b...@redhat.com wrote:
 
  CC more people ...
  
  On 03/07/15 at 01:31am, Hatayama, Daisuke/畑山 大輔 wrote:
   The commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 introduced
   crash_kexec_post_notifiers kernel boot option, which toggles
   wheather panic() calls crash_kexec() before panic_notifiers and dump
   kmsg or after.
   
   The problem is that the commit overlooks panic_on_oops kernel boot
   option. If it is enabled, crash_kexec() is called directly without
   going through panic() in oops path.
   
   To fix this issue, this patch adds a check to
   crash_kexec_post_notifiers in the condition of kexec_should_crash().
   
   Also, put a comment in kexec_should_crash() to explain not obvious
   things on this patch.
   
   Signed-off-by: HATAYAMA Daisuke d.hatay...@jp.fujitsu.com
   Acked-by: Baoquan He b...@redhat.com
   Tested-by: Hidehiro Kawai hidehiro.kawai...@hitachi.com
   Reviewed-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com
   ---
include/linux/kernel.h |  3 +++
kernel/kexec.c | 11 +++
kernel/panic.c |  2 +-
3 files changed, 15 insertions(+), 1 deletion(-)
 
 This is hack upon hack, but why was this crap merged in the first 
 place?
 
 I see two problems just by cursory review:
 
 1)
 
 Firstly, the real bug in:
 
   f06e5153f4ae (kernel/panic.c: add crash_kexec_post_notifiers option for 
 kdump after panic_notifers)
 
 Was that crash_kexec() was called unconditionally after notifiers were 
 called, which should be fixed via the simple patch below (untested). 
 Looks much simpler than your fix.
 

Hi Ingo,

Agreed. Your patch looks good.

 2)
 
 Secondly, and more importantly, the whole premise of commit 
 f06e5153f4ae is broken IMHO:
 
  This can help rare situations where kdump fails because of unstable
   crashed kernel or hardware failure (memory corruption on critical
   data/code)
 
 wtf?
 
 If the kernel crashed due to a kernel crash, then the kernel booting 
 up in whatever hardware state should be able to do a clean bootup. The 
 fix for those 'rare situations' should be to fix the real bug (for 
 example by making hardware driver init (or deinit) sequences more 
 robust), not to paper it over by ordering around crash-time sequences 
 ...
 
 If it crashed due to some hardware failure, there's literally an 
 infinite amount of failure modes that may or may not be impacted by 
 kexec crash-time handling ordering. We don't want to put a zillion 
 such flags into the kernel proper just to allow the perturbation of 
 the kernel.

I think one of the motivations behind this patch was call to kmsg_dump().
Some vendors have been wanting to have the capability to save kernel logs
to some NVRAM before transition to second kernel happens. Their argument
is that kdump does not succeed all the time and if kdump does not succeed
then atleast they have something to work with (kernel logs retrieved
from pstore interface).

Not that I agree fully with this as problem might happen while we try
to run panic_notifiers or kmsg_dump hooks and never transition into
kdump kernel.

And it has been literally years since some developers have been pushing for
allowing to run panic notifiers before crash_kexec(). Eric Biederman has been
pushing back saying it reduces the reliability of kdump operation so this
is not acceptable.

So while it is very hacky, this command line option was intorduced which
allowed to override default crash_kexec() behavior and those who want
to do additional things (at their own risk) before transition to second
kernel, can specify this parameter.

Thanks
Vivek

 
 diff --git a/kernel/panic.c b/kernel/panic.c
 index 8136ad76e5fd..774614f72cbd 100644
 --- a/kernel/panic.c
 +++ b/kernel/panic.c
 @@ -142,7 +142,8 @@ void panic(const char *fmt, ...)
* Note: since some panic_notifiers can make crashed kernel
* more unstable, it can increase risks of the kdump failure too.
*/
 - crash_kexec(NULL);
 + if (crash_kexec_post_notifiers)
 + crash_kexec(NULL);
  
   bust_spinlocks(0);
  

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v2] kernel/panic/kexec: fix crash_kexec_post_notifiers option issue in oops path

2015-03-23 Thread Vivek Goyal

On Mon, Mar 23, 2015 at 02:50:46PM +0100, Ingo Molnar wrote:
 
 * Vivek Goyal vgo...@redhat.com wrote:
 
  On Mon, Mar 23, 2015 at 08:19:43AM +0100, Ingo Molnar wrote:
   
   * Baoquan He b...@redhat.com wrote:
   
CC more people ...

On 03/07/15 at 01:31am, Hatayama, Daisuke/畑山 大輔 wrote:
 The commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 introduced
 crash_kexec_post_notifiers kernel boot option, which toggles
 wheather panic() calls crash_kexec() before panic_notifiers and dump
 kmsg or after.
 
 The problem is that the commit overlooks panic_on_oops kernel boot
 option. If it is enabled, crash_kexec() is called directly without
 going through panic() in oops path.
 
 To fix this issue, this patch adds a check to
 crash_kexec_post_notifiers in the condition of kexec_should_crash().
 
 Also, put a comment in kexec_should_crash() to explain not obvious
 things on this patch.
 
 Signed-off-by: HATAYAMA Daisuke d.hatay...@jp.fujitsu.com
 Acked-by: Baoquan He b...@redhat.com
 Tested-by: Hidehiro Kawai hidehiro.kawai...@hitachi.com
 Reviewed-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com
 ---
  include/linux/kernel.h |  3 +++
  kernel/kexec.c | 11 +++
  kernel/panic.c |  2 +-
  3 files changed, 15 insertions(+), 1 deletion(-)
   
   This is hack upon hack, but why was this crap merged in the first 
   place?
   
   I see two problems just by cursory review:
   
   1)
   
   Firstly, the real bug in:
   
 f06e5153f4ae (kernel/panic.c: add crash_kexec_post_notifiers option 
   for kdump after panic_notifers)
   
   Was that crash_kexec() was called unconditionally after notifiers were 
   called, which should be fixed via the simple patch below (untested). 
   Looks much simpler than your fix.
   
  
  Hi Ingo,
  
  Agreed. Your patch looks good.
 
 In case you want that simpler fix and need my SOB:
 
   Signed-off-by: Ingo Molnar mi...@kernel.org
 
 (but I have not tested it.)

I will quickly test it.

So this is a general fix but not a replacement for fix in this patch?

Because the problem original patch is trying to fix is that crash_kexec()
can be called from outside panic() too (kexec_should_crash()) and in that
case panic notifiers will not be called. So this patch is just trying to
delay the call to crash_kexec() to make it run much later.

 
   Secondly, and more importantly, the whole premise of commit 
   f06e5153f4ae is broken IMHO:
   
This can help rare situations where kdump fails because of unstable
 crashed kernel or hardware failure (memory corruption on critical
 data/code)
   
   wtf?
   
   If the kernel crashed due to a kernel crash, then the kernel booting 
   up in whatever hardware state should be able to do a clean bootup. The 
   fix for those 'rare situations' should be to fix the real bug (for 
   example by making hardware driver init (or deinit) sequences more 
   robust), not to paper it over by ordering around crash-time sequences 
   ...
   
   If it crashed due to some hardware failure, there's literally an 
   infinite amount of failure modes that may or may not be impacted by 
   kexec crash-time handling ordering. We don't want to put a zillion 
   such flags into the kernel proper just to allow the perturbation of 
   the kernel.
  
  I think one of the motivations behind this patch was call to kmsg_dump().
  Some vendors have been wanting to have the capability to save kernel logs
  to some NVRAM before transition to second kernel happens. Their argument
  is that kdump does not succeed all the time and if kdump does not succeed
  then atleast they have something to work with (kernel logs retrieved
  from pstore interface).
 
 Doesn't pstore attach itself to printk itself? AFAICS it does:
 
  fs/pstore/platform.c:   register_console(pstore_console);
 
 so the printk log leading up to and including the crash should be 
 available, regardless of this patch. What am I missing?

That's a good point. I was not aware of it. I am Ccing Don Zickus as
he has spent some time on this in the past.

Masami, would you have thougths on this? IIRC, one reason why kmsg_dump()
was written so that one could dump kernel messages to an NVRAM. Of one
could simple register pstore as console, then how kmsg_dump() will
continue to be useful?

 
  Not that I agree fully with this as problem might happen while we 
  try to run panic_notifiers or kmsg_dump hooks and never transition 
  into kdump kernel.
 
 btw., this is the big problem with 'notifiers' in general: they are 
 opaque with barely any semantics defined, and a source of constant 
 confusion.

Agreed. That's the reason Eric never liked the idea of letting panic
notifiers run before crash_kexec().

 
  And it has been literally years since some developers have been 
  pushing for allowing to run panic notifiers before crash_kexec(). 
  Eric Biederman has been pushing back saying

Re: [PATCH v2] kernel/panic/kexec: fix crash_kexec_post_notifiers option issue in oops path

2015-03-23 Thread Vivek Goyal

On Mon, Mar 23, 2015 at 08:19:43AM +0100, Ingo Molnar wrote:
 
 * Baoquan He b...@redhat.com wrote:
 
  CC more people ...
  
  On 03/07/15 at 01:31am, Hatayama, Daisuke/畑山 大輔 wrote:
   The commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 introduced
   crash_kexec_post_notifiers kernel boot option, which toggles
   wheather panic() calls crash_kexec() before panic_notifiers and dump
   kmsg or after.
   
   The problem is that the commit overlooks panic_on_oops kernel boot
   option. If it is enabled, crash_kexec() is called directly without
   going through panic() in oops path.
   
   To fix this issue, this patch adds a check to
   crash_kexec_post_notifiers in the condition of kexec_should_crash().
   
   Also, put a comment in kexec_should_crash() to explain not obvious
   things on this patch.
   
   Signed-off-by: HATAYAMA Daisuke d.hatay...@jp.fujitsu.com
   Acked-by: Baoquan He b...@redhat.com
   Tested-by: Hidehiro Kawai hidehiro.kawai...@hitachi.com
   Reviewed-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com
   ---
include/linux/kernel.h |  3 +++
kernel/kexec.c | 11 +++
kernel/panic.c |  2 +-
3 files changed, 15 insertions(+), 1 deletion(-)
 
 This is hack upon hack, but why was this crap merged in the first 
 place?
 
 I see two problems just by cursory review:
 
 1)
 
 Firstly, the real bug in:
 
   f06e5153f4ae (kernel/panic.c: add crash_kexec_post_notifiers option for 
 kdump after panic_notifers)
 
 Was that crash_kexec() was called unconditionally after notifiers were 
 called, which should be fixed via the simple patch below (untested). 
 Looks much simpler than your fix.
 
 2)
 
 Secondly, and more importantly, the whole premise of commit 
 f06e5153f4ae is broken IMHO:
 
  This can help rare situations where kdump fails because of unstable
   crashed kernel or hardware failure (memory corruption on critical
   data/code)
 
 wtf?
 
 If the kernel crashed due to a kernel crash, then the kernel booting 
 up in whatever hardware state should be able to do a clean bootup. The 
 fix for those 'rare situations' should be to fix the real bug (for 
 example by making hardware driver init (or deinit) sequences more 
 robust), not to paper it over by ordering around crash-time sequences 
 ...
 
 If it crashed due to some hardware failure, there's literally an 
 infinite amount of failure modes that may or may not be impacted by 
 kexec crash-time handling ordering. We don't want to put a zillion 
 such flags into the kernel proper just to allow the perturbation of 
 the kernel.
 
 Thanks,
 
   Ingo
 

I quickly tested this patch to make sure I can still transition into
second kernel when crash_kexec_post_notifiers is specified on command
line. I have not tried running any notifiers. So.

Tested-by: Vivek Goyal vgo...@redhat.com
Acked-by: Vivek Goyal vgo...@redhat.com

This should be a general fix and not a replacement for the patch
in question in this mail thread. 

Thanks
Vivek

 diff --git a/kernel/panic.c b/kernel/panic.c
 index 8136ad76e5fd..774614f72cbd 100644
 --- a/kernel/panic.c
 +++ b/kernel/panic.c
 @@ -142,7 +142,8 @@ void panic(const char *fmt, ...)
* Note: since some panic_notifiers can make crashed kernel
* more unstable, it can increase risks of the kdump failure too.
*/
 - crash_kexec(NULL);
 + if (crash_kexec_post_notifiers)
 + crash_kexec(NULL);
  
   bust_spinlocks(0);
  

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v2] kernel/panic/kexec: fix crash_kexec_post_notifiers option issue in oops path

2015-03-06 Thread Vivek Goyal

On Sat, Mar 07, 2015 at 01:31:01AM +0900, Hatayama, Daisuke/畑山 大輔 wrote:
 The commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 introduced
 crash_kexec_post_notifiers kernel boot option, which toggles
 wheather panic() calls crash_kexec() before panic_notifiers and dump
 kmsg or after.
 
 The problem is that the commit overlooks panic_on_oops kernel boot
 option. If it is enabled, crash_kexec() is called directly without
 going through panic() in oops path.
 
 To fix this issue, this patch adds a check to
 crash_kexec_post_notifiers in the condition of kexec_should_crash().
 
 Also, put a comment in kexec_should_crash() to explain not obvious
 things on this patch.
 
 Signed-off-by: HATAYAMA Daisuke d.hatay...@jp.fujitsu.com
 Acked-by: Baoquan He b...@redhat.com
 Tested-by: Hidehiro Kawai hidehiro.kawai...@hitachi.com
 Reviewed-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com

Looks good to me.

Acked-by: Vivek Goyal vgo...@redhat.com

Thanks
Vivek

 ---
  include/linux/kernel.h |  3 +++
  kernel/kexec.c | 11 +++
  kernel/panic.c |  2 +-
  3 files changed, 15 insertions(+), 1 deletion(-)
 
 diff --git a/include/linux/kernel.h b/include/linux/kernel.h
 index d6d630d..07483c7 100644
 --- a/include/linux/kernel.h
 +++ b/include/linux/kernel.h
 @@ -426,6 +426,9 @@ extern int panic_on_unrecovered_nmi;
  extern int panic_on_io_nmi;
  extern int panic_on_warn;
  extern int sysctl_panic_on_stackoverflow;
 +
 +extern bool crash_kexec_post_notifiers;
 +
  /*
   * Only to be used by arch init code. If the user over-wrote the default
   * CONFIG_PANIC_TIMEOUT, honor it.
 diff --git a/kernel/kexec.c b/kernel/kexec.c
 index 38c25b1..5bf6077 100644
 --- a/kernel/kexec.c
 +++ b/kernel/kexec.c
 @@ -84,6 +84,17 @@ struct resource crashk_low_res = {
 
  int kexec_should_crash(struct task_struct *p)
  {
 + /*
 +  * If crash_kexec_post_notifiers is enabled, don't run
 +  * crash_kexec() here yet, which must be run after panic
 +  * notifiers in panic().
 +  */
 + if (crash_kexec_post_notifiers)
 + return 0;
 + /*
 +  * There are 4 panic() calls in do_exit() path, each of which
 +  * calls corresponds to each of these 4 conditions.
 +  */
   if (in_interrupt() || !p-pid || is_global_init(p) || panic_on_oops)
   return 1;
   return 0;
 diff --git a/kernel/panic.c b/kernel/panic.c
 index 8136ad7..79ca912 100644
 --- a/kernel/panic.c
 +++ b/kernel/panic.c
 @@ -32,7 +32,7 @@ static unsigned long tainted_mask;
  static int pause_on_oops;
  static int pause_on_oops_flag;
  static DEFINE_SPINLOCK(pause_on_oops_lock);
 -static bool crash_kexec_post_notifiers;
 +bool crash_kexec_post_notifiers;
  int panic_on_warn __read_mostly;
 
  int panic_timeout = CONFIG_PANIC_TIMEOUT;
 -- 
 1.9.3
 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RESEND PATCH] kernel/panic/kexec: fix crash_kexec_post_notifiers option issue in oops path

2015-03-05 Thread Vivek Goyal

On Wed, Mar 04, 2015 at 05:56:48PM +0900, HATAYAMA Daisuke wrote:
 The commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 introduced
 crash_kexec_post_notifiers kernel boot option, which toggles
 wheather panic() calls crash_kexec() before or after panic_notifiers
 and dump kmsg.
 
 The problem is that the commit overlooks panic_on_oops kernel boot
 option. If it is enabled, crash_kexec() is called directly without
 going through panic() in oops path.
 
 To fix this issue, this patch adds a check to
 crash_kexec_post_notifiers in the condition of kexec_should_crash().
 
 Signed-off-by: HATAYAMA Daisuke d.hatay...@jp.fujitsu.com
 Acked-by: Baoquan He b...@redhat.com
 Tested-by: Hidehiro Kawai hidehiro.kawai...@hitachi.com
 ---
  include/linux/kernel.h | 3 +++
  kernel/kexec.c | 2 ++
  kernel/panic.c | 2 +-
  3 files changed, 6 insertions(+), 1 deletion(-)
 
 diff --git a/include/linux/kernel.h b/include/linux/kernel.h
 index 64ce58b..f47379f 100644
 --- a/include/linux/kernel.h
 +++ b/include/linux/kernel.h
 @@ -426,6 +426,9 @@ extern int panic_on_unrecovered_nmi;
  extern int panic_on_io_nmi;
  extern int panic_on_warn;
  extern int sysctl_panic_on_stackoverflow;
 +
 +extern bool crash_kexec_post_notifiers;
 +
  /*
   * Only to be used by arch init code. If the user over-wrote the default
   * CONFIG_PANIC_TIMEOUT, honor it.
 diff --git a/kernel/kexec.c b/kernel/kexec.c
 index 9a8a01a..0ecf252 100644
 --- a/kernel/kexec.c
 +++ b/kernel/kexec.c
 @@ -84,6 +84,8 @@ struct resource crashk_low_res = {
  
  int kexec_should_crash(struct task_struct *p)
  {
 + if (crash_kexec_post_notifiers)
 + return 0;

This is little confusing. So if crash_kexec_post_notifiers is set but
panic_on_oops is not set, still we will return?

Should we do this only if panic_on_oops is set? IOW, how about following

if (panic_on_oops  crash_kexec_post_notifiers)
return 0;

And then also put a comment explaining the rationale.

Thanks
Vivek

   if (in_interrupt() || !p-pid || is_global_init(p) || panic_on_oops)
   return 1;
   return 0;
 diff --git a/kernel/panic.c b/kernel/panic.c
 index 4d8d6f9..6582546 100644
 --- a/kernel/panic.c
 +++ b/kernel/panic.c
 @@ -32,7 +32,7 @@ static unsigned long tainted_mask;
  static int pause_on_oops;
  static int pause_on_oops_flag;
  static DEFINE_SPINLOCK(pause_on_oops_lock);
 -static bool crash_kexec_post_notifiers;
 +bool crash_kexec_post_notifiers;
  int panic_on_warn __read_mostly;
  
  int panic_timeout = CONFIG_PANIC_TIMEOUT;
 -- 
 1.9.3
 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RESEND PATCH] kernel/panic/kexec: fix crash_kexec_post_notifiers option issue in oops path

2015-03-05 Thread Vivek Goyal

On Thu, Mar 05, 2015 at 05:19:30PM -0500, Vivek Goyal wrote:
 On Wed, Mar 04, 2015 at 05:56:48PM +0900, HATAYAMA Daisuke wrote:
  The commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 introduced
  crash_kexec_post_notifiers kernel boot option, which toggles
  wheather panic() calls crash_kexec() before or after panic_notifiers
  and dump kmsg.
  
  The problem is that the commit overlooks panic_on_oops kernel boot
  option. If it is enabled, crash_kexec() is called directly without
  going through panic() in oops path.
  
  To fix this issue, this patch adds a check to
  crash_kexec_post_notifiers in the condition of kexec_should_crash().
  
  Signed-off-by: HATAYAMA Daisuke d.hatay...@jp.fujitsu.com
  Acked-by: Baoquan He b...@redhat.com
  Tested-by: Hidehiro Kawai hidehiro.kawai...@hitachi.com
  ---
   include/linux/kernel.h | 3 +++
   kernel/kexec.c | 2 ++
   kernel/panic.c | 2 +-
   3 files changed, 6 insertions(+), 1 deletion(-)
  
  diff --git a/include/linux/kernel.h b/include/linux/kernel.h
  index 64ce58b..f47379f 100644
  --- a/include/linux/kernel.h
  +++ b/include/linux/kernel.h
  @@ -426,6 +426,9 @@ extern int panic_on_unrecovered_nmi;
   extern int panic_on_io_nmi;
   extern int panic_on_warn;
   extern int sysctl_panic_on_stackoverflow;
  +
  +extern bool crash_kexec_post_notifiers;
  +
   /*
* Only to be used by arch init code. If the user over-wrote the default
* CONFIG_PANIC_TIMEOUT, honor it.
  diff --git a/kernel/kexec.c b/kernel/kexec.c
  index 9a8a01a..0ecf252 100644
  --- a/kernel/kexec.c
  +++ b/kernel/kexec.c
  @@ -84,6 +84,8 @@ struct resource crashk_low_res = {
   
   int kexec_should_crash(struct task_struct *p)
   {
  +   if (crash_kexec_post_notifiers)
  +   return 0;
 
 This is little confusing. So if crash_kexec_post_notifiers is set but
 panic_on_oops is not set, still we will return?
 
 Should we do this only if panic_on_oops is set? IOW, how about following
 
   if (panic_on_oops  crash_kexec_post_notifiers)
   return 0;
 
 And then also put a comment explaining the rationale.

Ok, I went through the previous version of patch and discussion there
which says that all the 4 conditions lead to panic. So putting above
code should be fine.

Can you please atleast put a comment here to explain it as it was not
obvious. Just mention that all the checks below lead to panic hence
if user wants to run panic notifiers then don't run crash_kexec() yet.
It will be run after panic notifiers.

Thanks
Vivek

 
 Thanks
 Vivek
 
  if (in_interrupt() || !p-pid || is_global_init(p) || panic_on_oops)
  return 1;
  return 0;
  diff --git a/kernel/panic.c b/kernel/panic.c
  index 4d8d6f9..6582546 100644
  --- a/kernel/panic.c
  +++ b/kernel/panic.c
  @@ -32,7 +32,7 @@ static unsigned long tainted_mask;
   static int pause_on_oops;
   static int pause_on_oops_flag;
   static DEFINE_SPINLOCK(pause_on_oops_lock);
  -static bool crash_kexec_post_notifiers;
  +bool crash_kexec_post_notifiers;
   int panic_on_warn __read_mostly;
   
   int panic_timeout = CONFIG_PANIC_TIMEOUT;
  -- 
  1.9.3
  

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: Edited kexec_load(2) [kexec_file_load()] man page for review

2015-01-28 Thread Vivek Goyal

On Wed, Jan 28, 2015 at 09:04:38AM +0100, Michael Kerrisk (man-pages) wrote:

Hi Michael,

[..]
  * the number of bytes copied from userspace is min(bufsz, memsz)
  
  Yes. bufsz can not be more than memsz. There is a check to validate
  this in kernel.
  
  result = -EINVAL;
  for (i = 0; i  nr_segments; i++) {
  if (image-segment[i].bufsz  image-segment[i].memsz)
  return result;
  }
 
 Okay. So it's more precise to leave discussion of min(bufz, memsz) 
 out of the man page just to say: bufsz bytes are transferred; 
 if bufsz  memsz, then the excess bytes in the target region are 
 filled with zeros. Right?

Sounds good.

[..]
  Both mem and memsz need to be page aligned.
 
 And the error if not is EADDRNOTAVAIL, right?

Yes.

 
  And one further question. Other than the fact that they are used with 
  different system calls, what is the difference between KEXEC_ON_CRASH 
  and KEXEC_FILE_ON_CRASH?
  
  Right now I can't think of any other difference. They both tell respective
  system call that this kernel needs to be loaded in reserved memory region
  for crash kernel.
 
 Okay.
 
 I've made various adjustments to the page in the light of your comments 
 above. Thanks!

Thank you for following it up and improving kexec man page.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFD] efi assisted kdump

2015-01-26 Thread Vivek Goyal

On Sat, Jan 24, 2015 at 09:26:37PM +0800, Dave Young wrote:
 Hi,
 
 Kdump has several limitations currently such as kdump kernel reboot will 
 bypass
 device shutdown path so device drivers should reset during initialization.
 
 * One of such problem we encounter now is the iommu
 issue, 1st kernel's on fly DMA requests cause 2nd kernel hang. There's some 
 effort
 on this area, Zhenhua Li posted patches to resolve the intel iommu issue 
 which is
 under review. But that is just one case there's other possible problems in 
 the future.
 
 * There's no serial console on most machines in the market especially for 
 desktop
 machines and laptops. kms enabled kernel need drm layer driver for framebuffer
 console, after kernel crashing we need a console to see the 2nd kernel output
 ideally a serial console because we can not switch back to VGA mode.
 
 ppc64 has a feature firmware assisted kdump, see below documentation:
 Documentation/powerpc/firmware-assisted-dump.txt
 
 In case UEFI machines I wonder if we can do similar things, basic idea is 
 doing
 minimum thing based on original kdump process.
 
 kernel reserve crashkernel memory during early boot
  - user (kdump service) save necessary informations in some way so that 
 second
 kernel boot can access ie. efi runtime variables or in reserved memory
 * crashkernel memory ranges infomation
 * collect infomations and save elf notes for 2nd kernel vmcore 
 initialization

I think anything related to vmcore initilization can be in memory
somewhere and should not be a problem.

But we do have to think about anything which is passed in bootparams or
command line to second kenrnel with current mechanism, how will it be
passed to second kernel.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: Edited kexec_load(2) [kexec_file_load()] man page for review

2015-01-12 Thread Vivek Goyal

On Wed, Jan 07, 2015 at 10:17:56PM +0100, Michael Kerrisk (man-pages) wrote:

[..]
  .BR KEXEC_ON_CRASH  (since Linux 2.6.13)
  Execute the new kernel automatically on a system crash.
  .\ FIXME Explain in more detail how KEXEC_ON_CRASH is actually used
 
 I wasn't expecting that you would respond to the FIXMEs that were 
 not labeled kexec_file_load, but I was hoping you might ;-). Thanks!
 I have a few additional questions to your nice notes.
 
  Upon boot first kernel reserves a chunk of contiguous memory (if
  crashkernel= command line paramter is passed). This memory is
  is used to load the crash kernel (Kernel which will be booted into
  if first kernel crashes).
 

Hi Michael,

 Can I just confirm: is it in all cases only possible to use kexec_load() 
 and kexec_file_load() if the kernel was booted with the 'crashkernel'
 parameter set?

As of now, only kexec_load() and kexec_file_load() system calls can
make use of memory reserved by crashkernel= kernel parameter. And
this is used only if we are trying to load a crash kernel (KEXEC_ON_CRASH
flag specified).

 
  Location of this reserved memory is exported to user space through
  /proc/iomem file. 
 
 Is that export via an entry labeled Crash kernel in the 
 /proc/iomem file?

Yes.

 
  User space can parse it and prepare list of segments
  specifying this reserved memory as destination.
 
 I'm not quite clear on specifying this reserved memory as destination.
 Is that done by specifying the address in the kexec_segment.mem fields?

You are absolutely right. User space can specify in kexec_segment.mem
field the memory location where it expecting a particular segment to
be loaded by kernel.

 
  Once kernel sees the flag KEXEC_ON_CRASH, it makes sure that all the
  segments are destined for reserved memory otherwise kernel load operation
  fails.
 
 Could you point me to where this checking is done? Also, what is the
 error (errno) that occurs when the load operation fails? (I think the
 answers to these questions are at the start of kimage_alloc_init()
 and EADDRNOTAVAIL, but I'd like to confirm.)

This checking happens in sanity_check_segment_list() which is called
by kimage_alloc_init().

And yes, error code returned is -EADDRNOTAVAIL.

 
  [..]
  struct kexec_segment {
  void   *buf;/* Buffer in user space */
  size_t  bufsz;  /* Buffer length in user space */
  void   *mem;/* Physical address of kernel */
  size_t  memsz;  /* Physical address length */
  };
  .fi
  .in
  .PP
  .\ FIXME Explain the details of how the kernel image defined by segments
  .\ is copied from the calling process into previously reserved memory.
  
  Kernel image defined by segments is copied into kernel either in regular
  memory 
 
 Could you clarify what you mean by regular memory?

I meant memory which is not reserved memory.

 
  or in reserved memory (if KEXEC_ON_CRASH is set). Kernel first
  copies list of segments in kernel memory and then goes does various
  sanity checks on the segments. If everything looks line, kernel copies
  segment data to kernel memory.
  
  In case of normal kexec, segment data is loaded in any available memory
  and segment data is moved to final destination at the kexec reboot time.
 
 By moved to final destination, do you mean moved from user space to the
 final kernel-space destination?

No. Segment data moves from user space to kernel space once kexec_load()
call finishes successfully. But when user does reboot (kexec -e), at that
time kernel moves that segment data to its final location. Kernel could
not place the segment at its final location during kexec_load() time as
that memory is already in use by running kernel. But once we are about
to reboot to new kernel, we can overwrite the old kernel's memory.

 
  In case of kexec on panic (KEXEC_ON_CRASH flag set), segment data is
  directly loaded to reserved memory and after crash kexec simply jumps
 
 By directly, I assume you mean at the time of the kexec_laod() call,
 right?

Yes.

Thanks
Vivek


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-12 Thread Vivek Goyal

On Mon, Jan 12, 2015 at 04:22:08PM +0100, Joerg Roedel wrote:
 On Mon, Jan 12, 2015 at 03:06:20PM +0800, Li, Zhen-Hua wrote:
  +
  +#ifdef CONFIG_CRASH_DUMP
  +
  +/*
  + * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
  + *
  + * Fixes the crashdump kernel to deal with an active iommu and legacy
  + * DMA from the (old) panicked kernel in a manner similar to how legacy
  + * DMA is handled when no hardware iommu was in use by the old kernel --
  + * allow the legacy DMA to continue into its current buffers.
  + *
  + * In the crashdump kernel, this code:
  + * 1. skips disabling the IOMMU's translating of IO Virtual Addresses 
  (IOVA).
  + * 2. Do not re-enable IOMMU's translating.
  + * 3. In kdump kernel, use the old root entry table.
  + * 4. Leaves the current translations in-place so that legacy DMA will
  + *continue to use its current buffers.
  + * 5. Allocates to the device drivers in the crashdump kernel
  + *portions of the iova address ranges that are different
  + *from the iova address ranges that were being used by the old kernel
  + *at the time of the panic.
  + *
  + */
 
 It looks like you are still copying the io-page-tables from the old
 kernel into the kdump kernel, is that right? With the approach that was
 proposed you only need to copy over the context entries 1-1. They are
 still pointing to the page-tables in the old kernels memory (which is
 just fine).

Kdump has the notion of backup region. Where certain parts of old kernels
memory can be moved to a different location (first 640K on x86 as of now)
and new kernel can make use of this memory now.

So we will have to just make sure that no parts of this old page table
fall into backup region.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-12 Thread Vivek Goyal

On Mon, Jan 12, 2015 at 05:06:46PM +0100, Joerg Roedel wrote:
 On Mon, Jan 12, 2015 at 10:29:19AM -0500, Vivek Goyal wrote:
  Kdump has the notion of backup region. Where certain parts of old kernels
  memory can be moved to a different location (first 640K on x86 as of now)
  and new kernel can make use of this memory now.
  
  So we will have to just make sure that no parts of this old page table
  fall into backup region.
 
 Uuh, looks like the 'iommu-with-kdump-issue' isn't complicated enough
 yet ;)
 Sadly, your above statement is true for all hardware-accessible data
 structures in IOMMU code. I think about how we can solve this, is there
 an easy way to allocate memory that is not in any backup region?

Hmm..., there does not seem to be any easy way to do this. In fact, as of
now, kernel does not even know where is backup region. All these details are
managed by user space completely (except for new kexec_file_load() syscall).

That means we are left with ugly options now.

- Define per arch kexec backup regions in kernel and export it to user
  space and let kexec-tools make use of that deinition (instead of
  defining its own). That way memory allocation code in kernel can look
  at this backup area and skip it for certain allocations.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec, remove panic_on_warn kernel parameter from kdump situations

2015-01-06 Thread Vivek Goyal

On Tue, Jan 06, 2015 at 04:05:00PM +0800, Dave Young wrote:
 On 01/05/15 at 08:54pm, Vivek Goyal wrote:
  On Tue, Jan 06, 2015 at 09:44:05AM +0800, Dave Young wrote:
   On 01/02/15 at 08:17am, Vivek Goyal wrote:
On Fri, Jan 02, 2015 at 08:07:20AM -0500, Prarit Bhargava wrote:
 
 
 On 01/02/2015 07:54 AM, Vivek Goyal wrote:
  On Tue, Dec 30, 2014 at 09:57:51AM -0500, Prarit Bhargava wrote:
  panic_on_warn kernel parameter will cause the kernel to panic when 
  a
  WARN() is hit in the kernel.  This is not a good situation for the 
  kdump
  kernel because then it would be possible for the kdump kernel to 
  panic in
  a non-fatal WARN().
 
  This patch removes panic_on_warn as a kernel parameter for the 
  kdump
  kernel.
 
  
  I think modifying kexec-tools is not best place for this. It 
  probably is better to take care of this in distribution specific 
  scripts.
  
  In the past we have learnt that it is best that kexec-tools does 
  least
  amount of manipulation with command line.
 
 Well .. here's the question to think about: what does adding 
 panic_on_warn to
 the kdump kernel get you?  AFAICT, nothing.

Let us consider a hypothetical situation. What if we have some buggy 
code
which will corrupt file system in certain situation and we detect that
situation and throw a warning. 

In that case as a work around specifying panic_on_warn in kdump kernel
will make sense as we don't want to make further progress if we hit
the warning as it has potential to corrupt fs.

Again this is hypothetical but it can happen. So panic_on_warn might
still be useful in kdump kernel for some corner debugging cases.

That's why I think we should do it in distribution specific scripts
and that too only if user did not specify panic_on_warn for second
kernel explicitly.
   
   Thinking of user who use upstream kexec-tools instead of distribution 
   toolset,
   In case kexec --reuse-cmdline, it will copy /proc/cmdline, but user will 
   have
   no way to remove part of them.
   
   I do want to insist on removing 'panic_on_warn' in upstream kexec-tools, 
   but
   we should give user an option to remove it. Something like:
   
   kexec --reuse-cmdline --remove-params=panic_on_warn will be good.
  
  If user is using --reuse-commandline at the same time does not want some
  of the parameters from command line, then don't use --reuse-commandline.
  
  This is overenginnering. First provide an option to reuse the commandline
  and provide another option to selectively remove some parameters from that
  commandline.
  
  What's wrong with existing parameters of --command-line. This just allows
  user to specify whatever command line is suitable.
  
  So, no, we should not provide --remove-params. If existing command line
  does not work for new kenrel, then user should not use
  --reuse-commandline option.
 
 Hmm, ok. So hope one who is use panic_on_warn in 1st kernel know what he is 
 doing
 and do not simply copy the 1st kernel cmdline for 2nd kernel.

--reuse-commandline will work only for kexec case and not kdump case. And
in case of kexec, it is fine to use panic_on_warn in kexeced kernel. So 
there is no need to worry here.

Thanks
Vivek


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec, remove panic_on_warn kernel parameter from kdump situations

2015-01-06 Thread Vivek Goyal

On Tue, Jan 06, 2015 at 08:46:44PM +0800, Baoquan He wrote:
 On 01/06/15 at 04:05pm, Dave Young wrote:
  On 01/05/15 at 08:54pm, Vivek Goyal wrote:
   On Tue, Jan 06, 2015 at 09:44:05AM +0800, Dave Young wrote:
On 01/02/15 at 08:17am, Vivek Goyal wrote:
 On Fri, Jan 02, 2015 at 08:07:20AM -0500, Prarit Bhargava wrote:
  
  
  On 01/02/2015 07:54 AM, Vivek Goyal wrote:
   On Tue, Dec 30, 2014 at 09:57:51AM -0500, Prarit Bhargava wrote:
   panic_on_warn kernel parameter will cause the kernel to panic 
   when a
   WARN() is hit in the kernel.  This is not a good situation for 
   the kdump
   kernel because then it would be possible for the kdump kernel to 
   panic in
   a non-fatal WARN().
  
   This patch removes panic_on_warn as a kernel parameter for the 
   kdump
   kernel.
  
   
   I think modifying kexec-tools is not best place for this. It 
   probably is better to take care of this in distribution specific 
   scripts.
   
   In the past we have learnt that it is best that kexec-tools does 
   least
   amount of manipulation with command line.
  
  Well .. here's the question to think about: what does adding 
  panic_on_warn to
  the kdump kernel get you?  AFAICT, nothing.
 
 Let us consider a hypothetical situation. What if we have some buggy 
 code
 which will corrupt file system in certain situation and we detect that
 situation and throw a warning. 
 
 In that case as a work around specifying panic_on_warn in kdump kernel
 will make sense as we don't want to make further progress if we hit
 the warning as it has potential to corrupt fs.
 
 Again this is hypothetical but it can happen. So panic_on_warn might
 still be useful in kdump kernel for some corner debugging cases.
 
 That's why I think we should do it in distribution specific scripts
 and that too only if user did not specify panic_on_warn for second
 kernel explicitly.

Thinking of user who use upstream kexec-tools instead of distribution 
toolset,
In case kexec --reuse-cmdline, it will copy /proc/cmdline, but user 
will have
no way to remove part of them.

I do want to insist on removing 'panic_on_warn' in upstream 
kexec-tools, but
we should give user an option to remove it. Something like:

kexec --reuse-cmdline --remove-params=panic_on_warn will be good.
   
   If user is using --reuse-commandline at the same time does not want some
   of the parameters from command line, then don't use --reuse-commandline.
   
   This is overenginnering. First provide an option to reuse the commandline
   and provide another option to selectively remove some parameters from that
   commandline.
   
   What's wrong with existing parameters of --command-line. This just allows
   user to specify whatever command line is suitable.
   
   So, no, we should not provide --remove-params. If existing command line
   does not work for new kenrel, then user should not use
   --reuse-commandline option.
  
  Hmm, ok. So hope one who is use panic_on_warn in 1st kernel know what he is 
  doing
  and do not simply copy the 1st kernel cmdline for 2nd kernel.
 
 I am fine with which postion it should be cared in some extent.
 This is truly a problem we need consider. If one distribution used
 doesn't handle it, and user using latest upstream kernel will be
 surprised by this.

This is true for all kernel parameters. One distribution might decide
to use some kenrel parameter by default during installation and other
might not. Is kexec-tools supposed to keep track of all kernel parameters?
It is just not possible.

That's why it is up to distributions to figure out what parameters work
for them and modify their scripts accordingly.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec, remove panic_on_warn kernel parameter from kdump situations

2015-01-05 Thread Vivek Goyal

On Tue, Jan 06, 2015 at 09:44:05AM +0800, Dave Young wrote:
 On 01/02/15 at 08:17am, Vivek Goyal wrote:
  On Fri, Jan 02, 2015 at 08:07:20AM -0500, Prarit Bhargava wrote:
   
   
   On 01/02/2015 07:54 AM, Vivek Goyal wrote:
On Tue, Dec 30, 2014 at 09:57:51AM -0500, Prarit Bhargava wrote:
panic_on_warn kernel parameter will cause the kernel to panic when a
WARN() is hit in the kernel.  This is not a good situation for the 
kdump
kernel because then it would be possible for the kdump kernel to panic 
in
a non-fatal WARN().
   
This patch removes panic_on_warn as a kernel parameter for the kdump
kernel.
   

I think modifying kexec-tools is not best place for this. It probably 
is better to take care of this in distribution specific scripts.

In the past we have learnt that it is best that kexec-tools does least
amount of manipulation with command line.
   
   Well .. here's the question to think about: what does adding 
   panic_on_warn to
   the kdump kernel get you?  AFAICT, nothing.
  
  Let us consider a hypothetical situation. What if we have some buggy code
  which will corrupt file system in certain situation and we detect that
  situation and throw a warning. 
  
  In that case as a work around specifying panic_on_warn in kdump kernel
  will make sense as we don't want to make further progress if we hit
  the warning as it has potential to corrupt fs.
  
  Again this is hypothetical but it can happen. So panic_on_warn might
  still be useful in kdump kernel for some corner debugging cases.
  
  That's why I think we should do it in distribution specific scripts
  and that too only if user did not specify panic_on_warn for second
  kernel explicitly.
 
 Thinking of user who use upstream kexec-tools instead of distribution toolset,
 In case kexec --reuse-cmdline, it will copy /proc/cmdline, but user will have
 no way to remove part of them.
 
 I do want to insist on removing 'panic_on_warn' in upstream kexec-tools, but
 we should give user an option to remove it. Something like:
 
 kexec --reuse-cmdline --remove-params=panic_on_warn will be good.

If user is using --reuse-commandline at the same time does not want some
of the parameters from command line, then don't use --reuse-commandline.

This is overenginnering. First provide an option to reuse the commandline
and provide another option to selectively remove some parameters from that
commandline.

What's wrong with existing parameters of --command-line. This just allows
user to specify whatever command line is suitable.

So, no, we should not provide --remove-params. If existing command line
does not work for new kenrel, then user should not use
--reuse-commandline option.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec, remove panic_on_warn kernel parameter from kdump situations

2015-01-02 Thread Vivek Goyal

On Tue, Dec 30, 2014 at 09:57:51AM -0500, Prarit Bhargava wrote:
 panic_on_warn kernel parameter will cause the kernel to panic when a
 WARN() is hit in the kernel.  This is not a good situation for the kdump
 kernel because then it would be possible for the kdump kernel to panic in
 a non-fatal WARN().
 
 This patch removes panic_on_warn as a kernel parameter for the kdump
 kernel.
 

I think modifying kexec-tools is not best place for this. It probably is better 
to take care of this in distribution specific scripts.

In the past we have learnt that it is best that kexec-tools does least
amount of manipulation with command line.

Thanks
Vivek

 Signed-off-by: Prarit Bhargava pra...@redhat.com
 Cc: Dave Young dyo...@redhat.com
 Cc: Vivek Goyal vgo...@redhat.com
 Cc: WANG Chao chaow...@redhat.com
 ---
  kexec/kexec.c |4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/kexec/kexec.c b/kexec/kexec.c
 index b088916..323cafb 100644
 --- a/kexec/kexec.c
 +++ b/kexec/kexec.c
 @@ -1048,8 +1048,10 @@ char *get_command_line(void)
   line[strlen(line) - 1] = '\0';
  
   remove_parameter(line, BOOT_IMAGE);
 - if (kexec_flags  KEXEC_ON_CRASH)
 + if (kexec_flags  KEXEC_ON_CRASH) {
   remove_parameter(line, crashkernel);
 + remove_parameter(line, panic_on_warn);
 + }
  
   return line;
  }
 -- 
 1.7.9.3

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec, remove panic_on_warn kernel parameter from kdump situations

2015-01-02 Thread Vivek Goyal

On Fri, Jan 02, 2015 at 08:07:20AM -0500, Prarit Bhargava wrote:
 
 
 On 01/02/2015 07:54 AM, Vivek Goyal wrote:
  On Tue, Dec 30, 2014 at 09:57:51AM -0500, Prarit Bhargava wrote:
  panic_on_warn kernel parameter will cause the kernel to panic when a
  WARN() is hit in the kernel.  This is not a good situation for the kdump
  kernel because then it would be possible for the kdump kernel to panic in
  a non-fatal WARN().
 
  This patch removes panic_on_warn as a kernel parameter for the kdump
  kernel.
 
  
  I think modifying kexec-tools is not best place for this. It probably is 
  better to take care of this in distribution specific scripts.
  
  In the past we have learnt that it is best that kexec-tools does least
  amount of manipulation with command line.
 
 Well .. here's the question to think about: what does adding panic_on_warn to
 the kdump kernel get you?  AFAICT, nothing.

Let us consider a hypothetical situation. What if we have some buggy code
which will corrupt file system in certain situation and we detect that
situation and throw a warning. 

In that case as a work around specifying panic_on_warn in kdump kernel
will make sense as we don't want to make further progress if we hit
the warning as it has potential to corrupt fs.

Again this is hypothetical but it can happen. So panic_on_warn might
still be useful in kdump kernel for some corner debugging cases.

That's why I think we should do it in distribution specific scripts
and that too only if user did not specify panic_on_warn for second
kernel explicitly.

 
 If panic_on_warn is specified, the only thing that will happen is that kdump
 will fail (which is always bad IMO).  There is no real difference in the stack
 trace between the WARN() and panic situations so there is no information loss.
 
 So I disagree -- we should never specify panic_on_warn on kdump kernel.

I am saying that do it in distribution specific scripts and not in 
kexec-tools.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v2] kdump, vmcoreinfo: report actual value of phys_base

2015-01-02 Thread Vivek Goyal

On Mon, Dec 15, 2014 at 03:11:20PM -0800, Andrew Morton wrote:
 
 (cc trimmed a bit)
 
 On Thu, 13 Nov 2014 11:30:11 +0900 (JST) HATAYAMA Daisuke 
 d.hatay...@jp.fujitsu.com wrote:
 
  Currently, VMCOREINFO note information reports the virtual address of
  phys_base that is assigned to symbol phys_base. But this doesn't make
  sense because to refer to phys_base, it's necessary to get the value
  of phys_base itself we are now about to refer to.
 
 Folks, could we please get a bit of reviewing, acking or nacking for
 this one?

To me this patch is more like a hack or a quick fix to get dump filtering
working when dump is generated using virsh-dump like tools. I feel there
should be discussion on what's the proper way to do this. I don't have
very specific ideas though at this point of time.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec: Fix a typo in comment

2015-01-02 Thread Vivek Goyal

On Fri, Jan 02, 2015 at 12:48:51PM -0600, Eric W. Biederman wrote:
 Alexander Kuleshov kuleshovm...@gmail.com writes:
 
  Signed-off-by: Alexander Kuleshov kuleshovm...@gmail.com
 Acked-by: Eric W. Biederman ebied...@xmission.com

[ CC akpm ]

Simple fix.

Acked-by: Vivek Goyal vgo...@redhat.com

Thanks
Vivek

 
  ---
   kernel/kexec.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/kernel/kexec.c b/kernel/kexec.c
  index 9a8a01a..75a8b7e 100644
  --- a/kernel/kexec.c
  +++ b/kernel/kexec.c
  @@ -444,7 +444,7 @@ arch_kexec_apply_relocations(const Elf_Ehdr *ehdr, 
  Elf_Shdr *sechdrs,
   }
   
   /*
  - * Free up memory used by kernel, initrd, and comand line. This is 
  temporary
  + * Free up memory used by kernel, initrd, and command line. This is 
  temporary
* memory allocation which is not needed any more after these buffers have
* been loaded into separate segments and have been copied elsewhere.
*/

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

2014-11-17 Thread Vivek Goyal

On Fri, Nov 14, 2014 at 10:31:33AM +0900, HATAYAMA Daisuke wrote:
 From: Vivek Goyal vgo...@redhat.com
 Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in 
 VMCOREINFO
 Date: Thu, 13 Nov 2014 09:25:48 -0500

  On Thu, Nov 13, 2014 at 05:30:21PM +0900, HATAYAMA, Daisuke wrote:

  (2014/11/13 17:06), Petr Tesarik wrote:
  On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
  HATAYAMA Daisuke d.hatay...@jp.fujitsu.com wrote:

  From: Vivek Goyal vgo...@redhat.com
  Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in 
  VMCOREINFO
  Date: Wed, 12 Nov 2014 17:12:05 -0500

  On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
  Currently, VMCOREINFO note information reports the virtual address of
  phys_base that is assigned to symbol phys_base. But this doesn't make
  sense because to refer to value of the phys_base, it's necessary to
  get the value of phys_base itself we are now about to refer to.

  Hi Hatayama,

  /proc/vmcore ELF headers have virtual address information and using
  that you should be able to read actual value of phys_base. gdb deals
  with virtual addresses all the time and can read value of any symbol
  using those headers.

  So I am not sure what's the need for exporting actual value of
  phys_base.

  Sorry, my logic in the patch description was wrong. For /proc/vmcore,
  there's enough information for makedumpdile to get phys_base. It's
  correct. The problem here is that other crash dump mechanisms that run
  outside Linux kernel independently don't have information to get
  phys_base.

  Yes, but these mechanisms won't be able to read VMCOREINFO either, will
  they?

  I don't intend such sophisticated function only by VMCOREINFO.
  Search vmcore for VMCOREINFO using strings + grep before opening it by 
  crash.
  I intend that only here.

  I think this is very crude and not proper way to get to vmcoreinfo. Can

 I agree it's crude, but it's useful enough for my usecase.

  you give more context. What are those mechanisms and what are you trying
  to do.

 I after all write the same thing in the patch description... I mean
 qemu dump, xendump (and other hypervisor dumps), firmware dumps
 implemented on each vendor system for the crash dump mechanism.

vmcoreinfo is exported by kdump mechanism (/proc/vmcore). These other
dump mechanism needs to figure a way out how to export relevant
information and it is not right to try to put more info in vmcoreinfo.

Don't try to write kernel data structures in such a way so that
somebody can scan these later. In an external dump mechanism there
is no notion of vmcoreinfo elf header. So these mechanisms need to
come up with their own way to query some basic information about
kernel and export appropriately.

Also this notion of relying on two mechanism is unnecessary 
introducing extra complexity. I think you should provide user
a choice so that they can configure one or other. If you think
that firmware dump mechanisms are more reliable, just use these.
In fact when crash happens then OS should call into some
firmware hook to trigger dump. And along that hook one should
be able to pass relevant info.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

2014-11-13 Thread Vivek Goyal

On Thu, Nov 13, 2014 at 05:30:21PM +0900, HATAYAMA, Daisuke wrote:

 (2014/11/13 17:06), Petr Tesarik wrote:
 On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
 HATAYAMA Daisuke d.hatay...@jp.fujitsu.com wrote:

 From: Vivek Goyal vgo...@redhat.com
 Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in 
 VMCOREINFO
 Date: Wed, 12 Nov 2014 17:12:05 -0500

 On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
 Currently, VMCOREINFO note information reports the virtual address of
 phys_base that is assigned to symbol phys_base. But this doesn't make
 sense because to refer to value of the phys_base, it's necessary to
 get the value of phys_base itself we are now about to refer to.

 Hi Hatayama,

 /proc/vmcore ELF headers have virtual address information and using
 that you should be able to read actual value of phys_base. gdb deals
 with virtual addresses all the time and can read value of any symbol
 using those headers.

 So I am not sure what's the need for exporting actual value of
 phys_base.

 Sorry, my logic in the patch description was wrong. For /proc/vmcore,
 there's enough information for makedumpdile to get phys_base. It's
 correct. The problem here is that other crash dump mechanisms that run
 outside Linux kernel independently don't have information to get
 phys_base.

 Yes, but these mechanisms won't be able to read VMCOREINFO either, will
 they?

 I don't intend such sophisticated function only by VMCOREINFO.
 Search vmcore for VMCOREINFO using strings + grep before opening it by crash.
 I intend that only here.

I think this is very crude and not proper way to get to vmcoreinfo. Can
you give more context. What are those mechanisms and what are you trying
to do.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

2014-11-12 Thread Vivek Goyal

On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
 Currently, VMCOREINFO note information reports the virtual address of
 phys_base that is assigned to symbol phys_base. But this doesn't make
 sense because to refer to value of the phys_base, it's necessary to
 get the value of phys_base itself we are now about to refer to.
 

Hi Hatayama,

/proc/vmcore ELF headers have virtual address information and using
that you should be able to read actual value of phys_base. gdb deals
with virtual addresses all the time and can read value of any symbol
using those headers.

So I am not sure what's the need for exporting actual value of
phys_base.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: Edited kexec_load(2) [kexec_file_load()] man page for review

2014-11-11 Thread Vivek Goyal

On Sun, Nov 09, 2014 at 08:17:49PM +0100, Michael Kerrisk (man-pages) wrote:
 Hello Vivek (and all),
 
 Thanks for the kexec_file_load() patch [for the kexec_load(2) man page]
 that you quite some time ago sent. I have merged it and done some
 substantial editing as well. Could you please take a look at the 
 draft below, and check that the kexec_file_load() material is okay.
 Please could you especially pay attention to the pieces marked
 FIXME(kexec_file_load), since those are pieces about which i
 had questions or doubts.
 

Hi Michael,

Thanks for editing this man page. I have some thoughts inline.

[..]
 .B #include linux/kexec.h
 
 .BI long kexec_load(unsigned long  entry , unsigned long  nr_segments ,
 .BI struct kexec_segment * segments \
 , unsigned long  flags );
 
 .\ FIXME(kexec_file_load):
 .\ Why are the return types of kexec_load() and kexec_file_load()
 .\ different?
 .BI int kexec_file_load(int  kernel_fd , int  initrd_fd ,

I think this is ignorance on my part. It probably should be long as
SYSCALL_DEFINE() seems to expand to.

asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));


 .br
 .BI unsigned long  cmdline_len  \
 , const char * cmdline ,
 .BI unsigned long  flags );
 
 .fi
 .IR Note :
 There are no glibc wrappers for these system calls; see NOTES.
 .SH DESCRIPTION
 The
 .BR kexec_load ()
 system call loads a new kernel that can be executed later by
 .BR reboot (2).
 .PP
 The
 .I flags
 argument is a bit mask that controls the operation of the call.
 The following values can be specified in
 .IR flags :
 .TP
 .BR KEXEC_ON_CRASH  (since Linux 2.6.13)
 Execute the new kernel automatically on a system crash.
 .\ FIXME Explain in more detail how KEXEC_ON_CRASH is actually used

Upon boot first kernel reserves a chunk of contiguous memory (if
crashkernel= command line paramter is passed). This memory is
is used to load the crash kernel (Kernel which will be booted into
if first kernel crashes).

Location of this reserved memory is exported to user space through
/proc/iomem file. User space can parse it and prepare list of segments
specifying this reserved memory as destination.

Once kernel sees the flag KEXEC_ON_CRASH, it makes sure that all the
segments are destined for reserved memory otherwise kernel load operation
fails.

[..]
 struct kexec_segment {
 void   *buf;/* Buffer in user space */
 size_t  bufsz;  /* Buffer length in user space */
 void   *mem;/* Physical address of kernel */
 size_t  memsz;  /* Physical address length */
 };
 .fi
 .in
 .PP
 .\ FIXME Explain the details of how the kernel image defined by segments
 .\ is copied from the calling process into previously reserved memory.

Kernel image defined by segments is copied into kernel either in regular
memory or in reserved memory (if KEXEC_ON_CRASH is set). Kernel first
copies list of segments in kernel memory and then goes does various
sanity checks on the segments. If everything looks line, kernel copies
segment data to kernel memory.

In case of normal kexec, segment data is loaded in any available memory
and segment data is moved to final destination at the kexec reboot time.

In case of kexec on panic (KEXEC_ON_CRASH flag set), segment data is
directly loaded to reserved memory and after crash kexec simply jumps
to starting point.

[..]
 .\ FIXME(kexec_file_load):
 .\ Is the following rationale accurate? Does it need expanding?
 The
 .BR kexec_file_load ()
 .\ See also http://lwn.net/Articles/603116/
 system call was added to provide support for systems
 where kexec loading should be restricted to
 only kernels that are signed.

Yes, this rationale looks good.

 
 The
 .BR kexec_load ()
 system call is available only if the kernel was configured with
 .BR CONFIG_KEXEC .
 The
 .BR kexec_file_load ()
 system call is available only if the kernel was configured with
 .BR CONFIG_KEXEC_FILE .
 .\ FIXME(kexec_file_load):
 .\ Does kexec_file_load() need any other CONFIG_* options to be defined?

Yes, it requires some other config options too.

depends on KEXEC
depends on X86_64
depends on CRYPTO=y
depends on CRYPTO_SHA256=y

CONFIG_KEXEC_VERIFY_SIG=y
CONFIG_KEXEC_BZIMAGE_VERIFY_SIG=y
CONFIG_SIGNED_PE_FILE_VERIFICATION=y
CONFIG_PKCS7_MESSAGE_PARSER=y
CONFIG_X509_CERTIFICATE_PARSER=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y

So dependency list seems pretty long. Not sure how many of these should
we specify in man page.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v8] kernel, add panic_on_warn

2014-11-06 Thread Vivek Goyal

On Thu, Nov 06, 2014 at 01:57:36PM -0800, David Rientjes wrote:

[..]
 You see that doing
 
   if (panic_on_warn) {
   panic_on_warn = 0;
   panic(...);
   }
 
 is racy, I hope.  If two threads WARN() at the same time, then there's 
 nothing preventing a double panic() because WARN() itself is not 
 serialized against anything.  So both the current comment and your 
 suggested revision comment are bogus.

panic() is serialized on panic_lock. So I guess it is fine to hit WARN()
on multiple cpus. Do you see an issue there?

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kernel, add panic_on_warn

2014-11-03 Thread Vivek Goyal

On Mon, Nov 03, 2014 at 08:32:42AM -0500, Prarit Bhargava wrote:
 
 
 On 10/30/2014 09:58 PM, Hedi Berriche wrote:
  On Thu, Oct 30, 2014 at 17:06 Prarit Bhargava wrote:
  | There have been several times where I have had to rebuild a kernel to
  | cause a panic when hitting a WARN() in the code in order to get a crash
  | dump from a system.  Sometimes this is easy to do, other times (such as
  | in the case of a remote admin) it is not trivial to send new images to the
  | user.
  | 
  | A much easier method would be a switch to change the WARN() over to a
  | panic.  This makes debugging easier in that I can now test the actual
  | image the WARN() was seen on and I do not have to engage in remote
  | debugging.
  
  Do we want to leave it to usersspace[1] to ensure panic_on_warn is out
  of the way in when the kdump kernel boots? or would a self-contained
  approach be more preferable i.e. test whether we're a kdump kernel
  before bothering with panic_on_warn?
 
 Hmm ... this is a good point.  Vivek, do you have a preference?  I'm willing 
 to
 code it either way.  I should be able to put in a is_kdump_kernel() check
 without any problems but I'm not sure if that is the right thing to do here.
 

I think it will make sense to modify user space scripts to get rid of
panic_on_warn for kdump kernel.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v6] kernel, add panic_on_warn

2014-11-03 Thread Vivek Goyal

On Mon, Nov 03, 2014 at 09:32:23AM -0500, Prarit Bhargava wrote:

[..]
 +
 +static int __init panic_on_warn_setup(char *s)
 +{
 + /* Enabling this on a kdump kernel could cause a bogus panic. */
 + if (!is_kdump_kernel())
 + panic_on_warn = 1;

I think it would be better if we leave it to user space to remove
panic_on_warn from command line of second kernel.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH V4] kernel, add bug_on_warn

2014-10-28 Thread Vivek Goyal

On Tue, Oct 28, 2014 at 05:44:25AM -0700, Andi Kleen wrote:
   I suppose ... but that would mean I would have to explain to an end user 
   the
   elaborate process of enabling kdb, inserting a break point, etc.  The 
   whole
   purpose of this is to let an end user panic on WARN() easily.
   
   Asking an end user to enable kdb is magnitudes worse than asking them to
   recompile a kernel.
  
  Agreed. Asking a customer to setup and run kdb and put breakpoints is much
  more pain than simply asking to reboot kernel with a command line option.
 
 If you have a command line option to execute kdb commands you still
 would only have a command line option, just a slightly longer one.
 
 kdb=on, bp warn_slowpath_common sr c, go 

So does it already work or proposal is to make something like this work
with kdb?

What about the case of enabling it post boot and using a /sys file for 
that.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [resend Patch v3 1/2] kaslr: check if kernel location is changed

2014-10-15 Thread Vivek Goyal

On Wed, Oct 15, 2014 at 11:37:01AM +0800, Baoquan He wrote:
 On 10/14/14 at 08:49am, Vivek Goyal wrote:
  On Mon, Oct 13, 2014 at 01:22:42PM -0400, Vivek Goyal wrote:
   On Mon, Oct 13, 2014 at 08:43:00AM -0700, H. Peter Anvin wrote:
On 10/13/2014 08:19 AM, Vivek Goyal wrote:

 This really shouldn't have happened this way on x86-64.  It has to 
 happen
 this way on i386, but I worry that this may be a serious misdesign 
 in kaslr
 on x86-64.  I'm also wondering if there is any other fallout of 
 this?

 I agree. On x86_64, we should stick to previous design and this new
 logic of performing relocations does not sound very clean and makes
 things very confusing.

 I am wondering that why couldn't we simply adjust page tables in 
 case of
 kaslr on x86_64, instead of performing relocations.
 
 Well, IIUC, if virtual addresses are shifted w.r.t what virtual 
 address
 kernel was compiled for, then relocation will have to be done.
 
 So question will be if physical address shift is enough for kaslr or
 virtual address shift is necessary.
 

I would assume that without a virtual address shift kaslr is pretty darn
pointless.  Without the physical address shift the 1:1 map can be used,
and again, kaslr becomes pointless.  However, there is absolutely no
reason why they should be coupled.  They can, in fact, be independently
randomized.
   
   Agreed. On x86_64, we should be able to randomize virtual address space
   and physical address space independently. And in that case whole of
   the physical memory should be available for a possible location for
   kernel. (As opposed to a small limit (I guess 1GB) now)
 
 It can be done to randomize virtual address space and physical address
 space independently. But limited by the 2G of kernel text mapping and
 module mapping virtual address space, virtual address can be randomized
 in (0x100, 1G) range. While physical address can be randomized in
 (0x100, 4G) according to the identity mapping of normal kernel. Then
 phys_base still stores an relative value, a different offset than before.
 
 This can be easily implement. One thing is still there's a limit for
 physical addr randomization, only below 4G. So I am wondering if we can
 extend the identify mapping to complete mapping of 48 bit, using 1G page
 frame. This can make physical addr be randomized to anywhere.

I am wondering where does this 4G limit come from? Without kaslr now
we are able to load kernels much higher than 4G. That would suggest
that we should be able to pick randomly any address above 4G too?

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [resend Patch v3 1/2] kaslr: check if kernel location is changed

2014-10-14 Thread Vivek Goyal

On Mon, Oct 13, 2014 at 01:22:42PM -0400, Vivek Goyal wrote:
 On Mon, Oct 13, 2014 at 08:43:00AM -0700, H. Peter Anvin wrote:
  On 10/13/2014 08:19 AM, Vivek Goyal wrote:
  
   This really shouldn't have happened this way on x86-64.  It has to 
   happen
   this way on i386, but I worry that this may be a serious misdesign in 
   kaslr
   on x86-64.  I'm also wondering if there is any other fallout of this?
  
   I agree. On x86_64, we should stick to previous design and this new
   logic of performing relocations does not sound very clean and makes
   things very confusing.
  
   I am wondering that why couldn't we simply adjust page tables in case of
   kaslr on x86_64, instead of performing relocations.
   
   Well, IIUC, if virtual addresses are shifted w.r.t what virtual address
   kernel was compiled for, then relocation will have to be done.
   
   So question will be if physical address shift is enough for kaslr or
   virtual address shift is necessary.
   
  
  I would assume that without a virtual address shift kaslr is pretty darn
  pointless.  Without the physical address shift the 1:1 map can be used,
  and again, kaslr becomes pointless.  However, there is absolutely no
  reason why they should be coupled.  They can, in fact, be independently
  randomized.
 
 Agreed. On x86_64, we should be able to randomize virtual address space
 and physical address space independently. And in that case whole of
 the physical memory should be available for a possible location for
 kernel. (As opposed to a small limit (I guess 1GB) now)

Hi Peter,

So what do we do about this issue in short term to make kexec work. Even
if we go for above solution, to make kexec work we will have to pass
nokaslr as we don't want kernel to move around in physical address space
as it might stomp over ELF headers we have stored.

If you don't like current patch, should we just disable relocations in
x86_64 if nokaslr command line is passed. That way kernel will not
be moved in physical as well as virtual address space.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [resend Patch v3 1/2] kaslr: check if kernel location is changed

2014-10-13 Thread Vivek Goyal

On Sat, Oct 11, 2014 at 03:34:29AM -0700, H. Peter Anvin wrote:
 On 10/10/2014 08:14 PM, Baoquan He wrote:
 On 10/08/14 at 03:27pm, Vivek Goyal wrote:
 On Wed, Oct 08, 2014 at 08:09:59AM -0700, H. Peter Anvin wrote:
 
 Sorry... this makes no sense.
 
 For x86-64, there is no direct connection between the physical and
 virtual address spaces that the kernel runs in...
 
 I am sorry I did not understand this one. I thought that initial
 relocatable kernel implementaion did not have any direct connection
 between virtual and physical address. One could load kernel anywhere
 and kernel virtual address will not change and we will just adjust
 page tables to map virtual address to right physical address.
 
 Now handle_relocation() stuff seems to introduce a close coupling
 between physical and virtual address. So if kernel shifts by 16MB
 in physical address space, then it will shift by equal amount
 in virtual address space. So there seems to be a direct connection
 between virtual and physical address space in this case.
 
 Yeah, it's exactly as Vivek said.
 
 Before kaslr was introduced, x86_64 kernel can be put anywhere, and
 always _text is 0x8100. Meanwhile phys_base contains the
 offset between the compiled addr (namely 0x100) and kernel loaded
 addr. After kaslr implementation was added, as long as kernel loaded
 addr is different 0x100, it will call handle_relocations(). The
 offset now is added onto each symbols including _text and phys_base
 becomes 0.
 
 It's clearly showing that by checking /proc/kallsyms and value of
 phys_base.
 
 
 This really shouldn't have happened this way on x86-64.  It has to happen
 this way on i386, but I worry that this may be a serious misdesign in kaslr
 on x86-64.  I'm also wondering if there is any other fallout of this?

I agree. On x86_64, we should stick to previous design and this new
logic of performing relocations does not sound very clean and makes
things very confusing.

I am wondering that why couldn't we simply adjust page tables in case of
kaslr on x86_64, instead of performing relocations.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [resend Patch v3 1/2] kaslr: check if kernel location is changed

2014-10-13 Thread Vivek Goyal

On Mon, Oct 13, 2014 at 08:52:57AM -0400, Vivek Goyal wrote:
 On Sat, Oct 11, 2014 at 03:34:29AM -0700, H. Peter Anvin wrote:
  On 10/10/2014 08:14 PM, Baoquan He wrote:
  On 10/08/14 at 03:27pm, Vivek Goyal wrote:
  On Wed, Oct 08, 2014 at 08:09:59AM -0700, H. Peter Anvin wrote:
  
  Sorry... this makes no sense.
  
  For x86-64, there is no direct connection between the physical and
  virtual address spaces that the kernel runs in...
  
  I am sorry I did not understand this one. I thought that initial
  relocatable kernel implementaion did not have any direct connection
  between virtual and physical address. One could load kernel anywhere
  and kernel virtual address will not change and we will just adjust
  page tables to map virtual address to right physical address.
  
  Now handle_relocation() stuff seems to introduce a close coupling
  between physical and virtual address. So if kernel shifts by 16MB
  in physical address space, then it will shift by equal amount
  in virtual address space. So there seems to be a direct connection
  between virtual and physical address space in this case.
  
  Yeah, it's exactly as Vivek said.
  
  Before kaslr was introduced, x86_64 kernel can be put anywhere, and
  always _text is 0x8100. Meanwhile phys_base contains the
  offset between the compiled addr (namely 0x100) and kernel loaded
  addr. After kaslr implementation was added, as long as kernel loaded
  addr is different 0x100, it will call handle_relocations(). The
  offset now is added onto each symbols including _text and phys_base
  becomes 0.
  
  It's clearly showing that by checking /proc/kallsyms and value of
  phys_base.
  
  
  This really shouldn't have happened this way on x86-64.  It has to happen
  this way on i386, but I worry that this may be a serious misdesign in kaslr
  on x86-64.  I'm also wondering if there is any other fallout of this?
 
 I agree. On x86_64, we should stick to previous design and this new
 logic of performing relocations does not sound very clean and makes
 things very confusing.
 
 I am wondering that why couldn't we simply adjust page tables in case of
 kaslr on x86_64, instead of performing relocations.

Well, IIUC, if virtual addresses are shifted w.r.t what virtual address
kernel was compiled for, then relocation will have to be done.

So question will be if physical address shift is enough for kaslr or
virtual address shift is necessary.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [resend Patch v3 1/2] kaslr: check if kernel location is changed

2014-10-13 Thread Vivek Goyal

On Mon, Oct 13, 2014 at 08:43:00AM -0700, H. Peter Anvin wrote:
 On 10/13/2014 08:19 AM, Vivek Goyal wrote:
 
  This really shouldn't have happened this way on x86-64.  It has to happen
  this way on i386, but I worry that this may be a serious misdesign in 
  kaslr
  on x86-64.  I'm also wondering if there is any other fallout of this?
 
  I agree. On x86_64, we should stick to previous design and this new
  logic of performing relocations does not sound very clean and makes
  things very confusing.
 
  I am wondering that why couldn't we simply adjust page tables in case of
  kaslr on x86_64, instead of performing relocations.
  
  Well, IIUC, if virtual addresses are shifted w.r.t what virtual address
  kernel was compiled for, then relocation will have to be done.
  
  So question will be if physical address shift is enough for kaslr or
  virtual address shift is necessary.
  
 
 I would assume that without a virtual address shift kaslr is pretty darn
 pointless.  Without the physical address shift the 1:1 map can be used,
 and again, kaslr becomes pointless.  However, there is absolutely no
 reason why they should be coupled.  They can, in fact, be independently
 randomized.

Agreed. On x86_64, we should be able to randomize virtual address space
and physical address space independently. And in that case whole of
the physical memory should be available for a possible location for
kernel. (As opposed to a small limit (I guess 1GB) now)

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [resend Patch v3 1/2] kaslr: check if kernel location is changed

2014-10-08 Thread Vivek Goyal

On Wed, Oct 08, 2014 at 08:09:59AM -0700, H. Peter Anvin wrote:
 On 10/01/2014 06:52 AM, Vivek Goyal wrote:
  
  Hi Peter,
  
  I think there is some confusion. I will try to clarify.
  
  If we have 32bit signed overflow, we will not have a functional kernel.
  And that's the message we get when we try to kexec with
  CONFIG_RANDOMIZE_BASE=y.
  
 
 And how does that happen?

I compile a kernel for physical address 16MB (CONFIG_PHYSICAL_START=0x100).
And kexec loads this kernel at physical address above 16GB (0x00042e00).

When we boot into second kernel it tries to perform relocations and fails
as follows. I have printed bunch of variables in handle_relocations(), so
referring to code will help.

min_addr=0x00042e00 (Physical address where decompressed kernel is
 loaded).

delta=0x00042d00  (Difference between load and compile addr).

map=0x0004ad00   (map = delta - __START_KERNEL_map)

Now we start processing 32bit relocations and read first reloc.

extended=0x81e819c2 (extended = *reloc)

We add map to it and new value of extended is.

extended=0x2ee819c2 (extended += map)

Now we convert this to actual 64bit address which relocation needs to be
applied and ptr value is.

ptr = 0x2ee819c2 (ptr = (unsigned long)extended;)

And this address is outside the range of where kernel is actually loaded.
(0x00042e00). So we ended up with a wrong address to patch hence
following check fails.

if (ptr  min_addr || ptr  max_addr)
error(32-bit relocation outside of kernel!\n);


 
  **
  [  340.709078] kexec: Starting new kernel
  early console in decompress_kernel
  KASLR disabled by default...
  
  Decompressing Linux... Parsing ELF...
  
  Performing relocations...
  32-bit relocation outside of kernel!
  
  
   -- System halted
  *
  
  We realized that kexec tries to load kernel at higher physical addresses
  and that can lead to problmes.
  
  Currently for x86_64, handle_relocations() will perform relocations if
  kernel is not loaded at LOAD_PHYSICAL_ADDR. I think this does not work for
  all the cases and kerenl can not be loaded anywhere in the physical address
  space. And that's why we run into issues with kexec.
  
  My understanding is that we introduce handle_relcoations() i386 style
  because of RANDOMIZE_BASE. If that's the case, one possible solution
  is that perform relocations only if ranodmize base logic has chosen
  a different load location for kernel than where boot loader loaded
  it. Otherwise don't do anything.
  
  In case of kexec/kdump, we will pass nokaslr to second kernel forcing
  it to do nothing and let the kernel run where it was loaded by bootloader.
  And in that case handle_relocations() should not do any relocations and
  that should allow kernel to be loaded anywhere in physical memory on
  x86_64.
  
 
 Sorry... this makes no sense.
 
 For x86-64, there is no direct connection between the physical and
 virtual address spaces that the kernel runs in...

I am sorry I did not understand this one. I thought that initial
relocatable kernel implementaion did not have any direct connection
between virtual and physical address. One could load kernel anywhere
and kernel virtual address will not change and we will just adjust
page tables to map virtual address to right physical address.

Now handle_relocation() stuff seems to introduce a close coupling
between physical and virtual address. So if kernel shifts by 16MB
in physical address space, then it will shift by equal amount
in virtual address space. So there seems to be a direct connection
between virtual and physical address space in this case.

What am I missing?

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] kexec: Remove unnecessary KERN_ERR from kexec.c

2014-10-07 Thread Vivek Goyal

On Tue, Oct 07, 2014 at 12:54:58PM +0900, Masanari Iida wrote:
 This patch remove unnecessary KERN_ERR from pr_err() within kexec.c.
 
 Signed-off-by: Masanari Iida standby2...@gmail.com

[cc akpm]

Thanks for the fix. 

Acked-by: Vivek Goyal vgo...@redhat.com

Vivek

 ---
  kernel/kexec.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/kernel/kexec.c b/kernel/kexec.c
 index 2abf9f6..9a8a01a 100644
 --- a/kernel/kexec.c
 +++ b/kernel/kexec.c
 @@ -600,7 +600,7 @@ kimage_file_alloc_init(struct kimage **rimage, int 
 kernel_fd,
   if (!kexec_on_panic) {
   image-swap_page = kimage_alloc_control_pages(image, 0);
   if (!image-swap_page) {
 - pr_err(KERN_ERR Could not allocate swap buffer\n);
 + pr_err(Could not allocate swap buffer\n);
   goto out_free_control_pages;
   }
   }
 -- 
 2.1.1.273.g97b8860

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/7] arm64 kexec kernel patches V3

2014-10-07 Thread Vivek Goyal

On Thu, Oct 02, 2014 at 03:59:55PM -0700, Geoff Levand wrote:
 Hi Vivek,
 
 On Thu, 2014-10-02 at 15:08 -0400, Vivek Goyal wrote:
  On Tue, Sep 30, 2014 at 02:27:56PM -0700, Geoff Levand wrote:
   For a running system you can check the device tree:
   
 cat /proc/device-tree/cpus/cpu\@0/enable-method | hexdump -C
   
  
  So system I have supports spin-table method for cpu bringup. How do I 
  test your patches with that system. Are there any patches on your
  spin-table branch which can make it working?
 
 If possible, check if there is a firmware update that supports PSCI.
 
 My spin-table patches are now out of date, and fixing those up is
 now low priority.

So psci method for cpu bring up is more popular as comapred to 
spin-table one? 

 
 I modified kexec-tools to only issue a message, but accept a device
 tree that does not have the new cpu-return-addr property that is
 needed for kexec on spin-table systems.  Since the spin-table stuff
 is only for managing secondary CPUs, this change should allow you to
 test kexec with a 1st stage kernel built with CONFIG_SMP=n.
 
 Since the secondary CPUs will have never left the spin-table, you
 should be able to kexec re-boot into an SMP kernel, but you will
 not be able to do a successful kexec re-boot from there.

Ok, I can compile kernel with CONFIG_SMP=y but use maxcpus=1 for first
kernel and hopefully that works.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 6/7] arm64/kexec: Add core kexec support

2014-10-07 Thread Vivek Goyal

On Tue, Oct 07, 2014 at 01:12:57PM -0700, Geoff Levand wrote:
Hi Vivek,

On Tue, 2014-10-07 at 14:45 -0400, Vivek Goyal wrote:
On Tue, Oct 07, 2014 at 11:42:00AM -0700, Geoff Levand wrote:
Adding purgatory code to arm64 is low priority, and I currently
have no plan to do that. Users are asking for kdump, and proper
UEFI support, so that is what I will work towards.

I think having purgatory enabled is very important here as in kernel
you are hardcoding that one of the segments is DTB and doing all the
magic tricks w.r.t putting a magic number.

I don't argue that having purgatory code could be useful, but as of
now, enabling the other features is what I'll work towards.

Regarding the device tree magic number, I'm wondering if you missed
that the device tree has a header, and that header has a magic
number. See here:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/scripts/dtc/libfdt/fdt.h#n6

Problem is this that if you put code in kernel once which does something
which purgatory ought to do, you will never be able to remove it for
backward compatibility reasons. Older versions of kexec-tools will
continue to rely on it. Also how in kernel you would know that now
purgatory will take care of this and kernel does not have to worry
about something. So it is a good idea to integrate the purgatory support
from the very beginning.

Also, verifying checksums of loaded segments before jumping to that kernel
is a must from feature point of view.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 6/7] arm64/kexec: Add core kexec support

2014-10-02 Thread Vivek Goyal

On Thu, Oct 02, 2014 at 11:26:25AM +0100, Mark Rutland wrote:
 On Wed, Oct 01, 2014 at 08:22:45PM +0100, Vivek Goyal wrote:
  On Wed, Oct 01, 2014 at 07:03:04PM +0100, Mark Rutland wrote:
  
  [..]
   I assume we'd have the first kernel perform the required cache 
   maintenance.
   
  
  Hi Mark,
  
  I am wondering, what kind of cache management is required here? What kind of
  dcaches are present on arm64.
 
 In ARMv8 there's a hierarchy of quasi-PIPT D-caches; they generally
 behave like (and can be maintained as if) they are PIPT but might not
 actually be PIPT. There may be a system level cache between the
 architected cache hierarchy and memory (that should respect cache
 maintenance by VA).
 
 The MT_NORMAL attributes are such that most memory the kernel maps will
 have write-back read/write allocate attributes. So cache maintenance is
 required to ensure that data is cleaned from the D-caches out to the PoC
 (the point in the memory system at which non-cacheable accesses can see
 the same data), such that the CPU can see the images rather than stale
 data once translation is disabled.
 
  I see that Geoff's patches flush dcaches for 
  certain kexec stored pages using __flush_dcache_area()
  (in kexec_list_flush_cb()).
  
  arch/arm64/include/asm/cacheflush.h says following.
  
   *  __flush_dcache_area(kaddr, size)
   *
   *  Ensure that the data held in page is written back.
   *  - kaddr  - page address
   *  - size   - region size
  
  So looks like we are trying to write back anything which we will access
  after switching off MMU. If that's the case, I have two questions.
  
  - Why do we need to writeback that cacheline. After switching off MMU,
will we not access same cacheline. I thought caches are VIPT and tag
will still remain the same (but I might easily be wrong here).
 
 As I mention above, the initial cache flush by VA is to ensure that the
 data is visible to the CPU once translation is disabled. I'm not sure I
 follow your reasoning.

I was assuming that even after we disable translations, cpu will still
read data from dcache if it is available there. Looks like you are
saying that once translation is disabled, data will be read from memory
hence it is important to flush out dcache before disabling translation.
Did I understand it right?

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/7] arm64 kexec kernel patches V3

2014-10-02 Thread Vivek Goyal

On Tue, Sep 30, 2014 at 02:27:56PM -0700, Geoff Levand wrote:
 Hi Vivek,
 
 On Tue, 2014-09-30 at 16:29 -0400, Vivek Goyal wrote:
  On Thu, Sep 25, 2014 at 12:23:26AM +, Geoff Levand wrote:
  
  [..]
   To load a second stage kernel and execute a kexec re-boot on arm64 my 
   patches to
   kexec-tools [2], which have not yet been merged upstream, are needed.
   
   This series does not include some re-work of the spin-table CPU enable 
   method
   that is needed to support it,
  
  How do I figure out if my system has spin table enable method or psci
  enable method. Can one change it. I wanted to test your patches.
 
 The enable method is a function the firmware/bootloader provides.
 Multiple methods may be supported.  The boot-wrapper-aarch64
 build defaults to spin-table, but has the configure option --enable-psci.
 
 For a running system you can check the device tree:
 
   cat /proc/device-tree/cpus/cpu\@0/enable-method | hexdump -C
 

Hi Geoff,

So system I have supports spin-table method for cpu bringup. How do I 
test your patches with that system. Are there any patches on your
spin-table branch which can make it working?

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [resend Patch v3 1/2] kaslr: check if kernel location is changed

2014-10-01 Thread Vivek Goyal

On Tue, Sep 30, 2014 at 02:21:05PM -0700, H. Peter Anvin wrote:
 On 09/30/2014 12:08 AM, Baoquan He wrote:
  Function handle_relocations() is used to do the relocations handling
  for i686 and kaslr of x86_64. For 32 bit the relocation handling is
  mandotary to perform. For x86_64 only when kaslr is enabled and a
  random kernel location is chosen successfully the relocation handling
  shound be done. However previous implementation only compared the
  kernel loading address and LOAD_PHYSICAL_ADDR where kernel were
  compiled to run at. This would casue system to be exceptional in
  few conditions like when delta between load address and compiled
  address is bigger than what 32bit signed relocations can handle.
  Also there will be limitations that delta can't be too big otherwise
  kernel text virtual addresses will overflow in module address space.
  
  So in this patch check if kernel location is changed after
  choose_kernel_location() when x86_64. If and only if in x86_64
  and kernel location is changed, we say a kaslr random kernel
  location is chosen, then the relocation handling is needed.
  
  Signed-off-by: Baoquan He b...@redhat.com
  Acked-by: Vivek Goyal vgo...@redhat.com
  Acked-by: Kees Cook keesc...@chromium.org
  Tested-by: Thomas D. whi...@whissi.de
  Cc: sta...@vger.kernel.org
 
 Could you clarify under what conditions we may end up with 32-bit signed
 overflow, and yet have a functional kernel?


Hi Peter,

I think there is some confusion. I will try to clarify.

If we have 32bit signed overflow, we will not have a functional kernel.
And that's the message we get when we try to kexec with
CONFIG_RANDOMIZE_BASE=y.

**
[  340.709078] kexec: Starting new kernel
early console in decompress_kernel
KASLR disabled by default...

Decompressing Linux... Parsing ELF...

Performing relocations...
32-bit relocation outside of kernel!


 -- System halted
*

We realized that kexec tries to load kernel at higher physical addresses
and that can lead to problmes.

Currently for x86_64, handle_relocations() will perform relocations if
kernel is not loaded at LOAD_PHYSICAL_ADDR. I think this does not work for
all the cases and kerenl can not be loaded anywhere in the physical address
space. And that's why we run into issues with kexec.

My understanding is that we introduce handle_relcoations() i386 style
because of RANDOMIZE_BASE. If that's the case, one possible solution
is that perform relocations only if ranodmize base logic has chosen
a different load location for kernel than where boot loader loaded
it. Otherwise don't do anything.

In case of kexec/kdump, we will pass nokaslr to second kernel forcing
it to do nothing and let the kernel run where it was loaded by bootloader.
And in that case handle_relocations() should not do any relocations and
that should allow kernel to be loaded anywhere in physical memory on
x86_64.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 6/7] arm64/kexec: Add core kexec support

2014-10-01 Thread Vivek Goyal

On Tue, Sep 30, 2014 at 12:54:37PM -0700, Geoff Levand wrote:

[..]
   +{
   + switch (flag) {
   + case IND_INDIRECTION:
   + case IND_SOURCE:
   + __flush_dcache_area(addr, PAGE_SIZE);
   + break;
  
  So what does __flush_dcache_area() do? Flush data caches. IIUC, addr
  is virtual address at this point of time. While copying pages and
  walking through the list, I am assuming you have switched off page
  tables and you are in some kind of 1:1 physical mode. So how did
  flushing data caches related to a virtual address help. I guess we
  are not even accessing that virtual address now.
 
 __flush_dcache_area(), and the underling aarch64 civac instruction
 operate on virtual addresses.  Here we are still running with the
 MMU on and the identity mapping has not yet been enabled.  This is
 the sequence:
 
   flush dcache - turn off MMU, dcache - access memory (PoC) directly 

Sorry, I don't understand that why do we need to flush dcache for source
and indirection page addresses. Some information here will help.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/7] arm64 kexec kernel patches V3

2014-10-01 Thread Vivek Goyal

On Thu, Sep 25, 2014 at 12:23:26AM +, Geoff Levand wrote:
 Hi All,
 
 This series adds the core support for kexec re-boots on arm64.  I have tested
 with the ARM VE fast model using various kernel config options for both the
 first and second stage kernels.

Hi Geoff,

Does this patch series work with kexec on UEFI machines?

Thanks
Vivek


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 6/7] arm64/kexec: Add core kexec support

2014-10-01 Thread Vivek Goyal

On Wed, Oct 01, 2014 at 05:16:21PM +0100, Mark Rutland wrote:

[..]
   So this implementation makes passing dtb mandatory. So it will not work
   with ACPI?
  
  I have not yet considered ACPI.  It will most likely need to have
  something done differently.  Secure boot will also need something
  different, and I expect it will use your new kexec_file_load().
 
 A DTB is mandatory for arm64, and is used to pass the command line,
 (optionally) initrd, and other parameters, even if it doesn't contain HW
 description. In the EFI case the EFI stub will create a trivial DTB if
 necessary, and the kernel will detect any ACPI tables via UEFI, so the
 DTB should be sufficient for ACPI.
 
 I'm still rather unhappy about the mechanism by which the DTB is passed
 by userspace and detected by the kernel, as I'd prefer that the user
 explictly stated which segment they wanted to pass to the (Linux)
 kernel, but that would require reworking the kexec syscall to allow
 per-segment info/flags.

Yep, in this case, it would have been nice if there were per segment
flags to identify type of segment. But unfortunately we don't have. So
in the absence of that, I think putting 4 bytes as dtb magic in the
beginning of segment should work (though no ideal).

 
 To me it seems that for all the talk of kexec allowing arbitrary kernels
 to be booted it's really just a linux-linux reboot bridge. Does anyone
 use kexec to boot something that isn't Linux?

 
   Where is dtb present? How is it passed to first kernel? Can it still
   be around in memory and second kernel can access it?
  
  The user space program (kexec-tools, etc.) passes a dtb.  That dtb
  could be a copy of the currently one, or a new one specified by
  the user.
  
   I mean in ACPI world on x86, all the ACPI info is still present and second
   kernel can access it without it being explicitly to second kernel in
   memory. Can something similar happen for dtb?
 
 Any ACPI tables should remain, given they'll be reserved in the UEFI
 memory map. The second kernel can find them as the first kernel did, via
 UEFI tables, which it will fine via the DTB.
 
 For the DTB, reusing the original DTB is a possibility. From what I
 recall, Grant seemed to prefer re-packing the existing tree as this
 would allow for state destroyed at boot to be corrected for.
 
 Regardless, being able to pass a DTB from userspace is a useful option
 (especially for the Linux-as-a-bootloader approach that's been mentioned
 a lot). That doesn't work for the secureboot case without a new syscall
 as we can't pass a signed DTB (or any other additional objects other
 than an initrd) to kexec_file_load, but disallowing the user to pass a
 new DTB in that case seems reasonable.

Yes, kexec_file_load() will not allow passing anything except, kernel,
initrd and command line. So syscall implementation will have to resuse
the existing DTB and pass it to second kernel. 

If there are concerns w.r.t state of DTB which can be destroyed during
boot, I guess we will have to store a copy of DTB somewhere early during
boot and kexec can access that original copy during kernel load time.

Thanks
Vivek



 
 Mark.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 6/7] arm64/kexec: Add core kexec support

2014-10-01 Thread Vivek Goyal

On Wed, Oct 01, 2014 at 05:16:21PM +0100, Mark Rutland wrote:

[..]
 I'm still rather unhappy about the mechanism by which the DTB is passed
 by userspace and detected by the kernel, as I'd prefer that the user
 explictly stated which segment they wanted to pass to the (Linux)
 kernel, but that would require reworking the kexec syscall to allow
 per-segment info/flags.

Why does the running kernel need to know about dtb segment.  I see following.

ldr x0, kexec_dtb_addr

IIUC, we are loading this address in x0. Can't we do something similar
in user space with purgatory. I mean first jump to purgatory (code
compiled in user space but runs prviliged) and that code takes care
of loading x0 with right dtb addr and then jump to final kernel.

IOW, I am not able to understand that why kernel implementation needs
to know which is dtb segment.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 6/7] arm64/kexec: Add core kexec support

2014-10-01 Thread Vivek Goyal

On Wed, Oct 01, 2014 at 07:03:04PM +0100, Mark Rutland wrote:
 On Wed, Oct 01, 2014 at 06:47:14PM +0100, Vivek Goyal wrote:
  On Wed, Oct 01, 2014 at 05:16:21PM +0100, Mark Rutland wrote:
  
  [..]
   I'm still rather unhappy about the mechanism by which the DTB is passed
   by userspace and detected by the kernel, as I'd prefer that the user
   explictly stated which segment they wanted to pass to the (Linux)
   kernel, but that would require reworking the kexec syscall to allow
   per-segment info/flags.
  
  Why does the running kernel need to know about dtb segment.  I see 
  following.
  
  ldr x0, kexec_dtb_addr
  
  IIUC, we are loading this address in x0. Can't we do something similar
  in user space with purgatory. I mean first jump to purgatory (code
  compiled in user space but runs prviliged) and that code takes care
  of loading x0 with right dtb addr and then jump to final kernel.
 
 I believe the fundamental issue here is a lack of a userspace-provided
 purgatory.
 
 I agree that userspace purgatory code could set this up. That would
 address my concerns w.r.t. detecting the DTB kernel-side, as there would
 be no need. It would also address my concerns with booting OSs other
 than Linux, as the purgatory code could do whatever was appropriate for
 whatever OS image was loaded.
 
 So in my view, a userspace-provided purgatory that set up the state the
 next kernel expected would be preferable. That could be as simple as
 setting up the registers and branching -- I assume we'd have the first
 kernel perform the required cache maintenance.

Apart from setting various registers, we also verify the sha256 checksums
of loaded segments in purgatory to make sure segments are not corrupted.
On x86, we also take care of backing up first 640KB of memory in reserved
area in kdump case. 

So other arches are already doing all this in purgatory. It would be nice
if arm64 sticks to that convention too.

First kernel -- purgatory  second kernel.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 6/7] arm64/kexec: Add core kexec support

2014-10-01 Thread Vivek Goyal

On Wed, Oct 01, 2014 at 07:19:59PM +0100, Mark Rutland wrote:
 On Wed, Oct 01, 2014 at 07:09:09PM +0100, Vivek Goyal wrote:
  On Wed, Oct 01, 2014 at 07:03:04PM +0100, Mark Rutland wrote:
   On Wed, Oct 01, 2014 at 06:47:14PM +0100, Vivek Goyal wrote:
On Wed, Oct 01, 2014 at 05:16:21PM +0100, Mark Rutland wrote:

[..]
 I'm still rather unhappy about the mechanism by which the DTB is 
 passed
 by userspace and detected by the kernel, as I'd prefer that the user
 explictly stated which segment they wanted to pass to the (Linux)
 kernel, but that would require reworking the kexec syscall to allow
 per-segment info/flags.

Why does the running kernel need to know about dtb segment.  I see 
following.

ldr x0, kexec_dtb_addr

IIUC, we are loading this address in x0. Can't we do something similar
in user space with purgatory. I mean first jump to purgatory (code
compiled in user space but runs prviliged) and that code takes care
of loading x0 with right dtb addr and then jump to final kernel.
   
   I believe the fundamental issue here is a lack of a userspace-provided
   purgatory.
   
   I agree that userspace purgatory code could set this up. That would
   address my concerns w.r.t. detecting the DTB kernel-side, as there would
   be no need. It would also address my concerns with booting OSs other
   than Linux, as the purgatory code could do whatever was appropriate for
   whatever OS image was loaded.
   
   So in my view, a userspace-provided purgatory that set up the state the
   next kernel expected would be preferable. That could be as simple as
   setting up the registers and branching -- I assume we'd have the first
   kernel perform the required cache maintenance.
  
  Apart from setting various registers, we also verify the sha256 checksums
  of loaded segments in purgatory to make sure segments are not corrupted.
  On x86, we also take care of backing up first 640KB of memory in reserved
  area in kdump case. 
 
 I was under the (possibly mistaken) impression that for kdump the second
 kernel lived and ran at a high address so as to preserve memory in use
 by the first kernel. Is the first 640KiB is special on x86, or is does
 it have some kdump-specific use?

Use of first 640KB by second kernel is x86 specific. And it was long back
and I am not sure if this requirement exists today or not. Just that
things have been working and nobody has bothered to look into optimizing
it further.

Kdump kernel does run from reserved memory. This memory is reserved
very early during boot so that first kernel does not end up using it.
So it does not matter whether that memory is reserved high or low. First
kernel is not going to use it as it is reserved. Hence memory contents
of first kernel will be preserved.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 6/7] arm64/kexec: Add core kexec support

2014-10-01 Thread Vivek Goyal

On Wed, Oct 01, 2014 at 07:03:04PM +0100, Mark Rutland wrote:

[..]
 I assume we'd have the first kernel perform the required cache maintenance.
 

Hi Mark,

I am wondering, what kind of cache management is required here? What kind of
dcaches are present on arm64. I see that Geoff's patches flush dcaches for 
certain kexec stored pages using __flush_dcache_area()
(in kexec_list_flush_cb()).

arch/arm64/include/asm/cacheflush.h says following.

 *  __flush_dcache_area(kaddr, size)
 *
 *  Ensure that the data held in page is written back.
 *  - kaddr  - page address
 *  - size   - region size

So looks like we are trying to write back anything which we will access
after switching off MMU. If that's the case, I have two questions.

- Why do we need to writeback that cacheline. After switching off MMU,
  will we not access same cacheline. I thought caches are VIPT and tag
  will still remain the same (but I might easily be wrong here).

- Even if we have to flush that cacheline, for kexec pages, I guess it
  should be done at kernel load time and not at the time of transition
  into new kernel. That seems too late. Once the kernel has been loaded,
  we don't overwrite these pages anymore. So a dcache flush at that
  time should be good.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 6/7] arm64/kexec: Add core kexec support

2014-09-30 Thread Vivek Goyal

On Thu, Sep 25, 2014 at 12:23:27AM +, Geoff Levand wrote:

[..]
 diff --git a/arch/arm64/kernel/machine_kexec.c 
 b/arch/arm64/kernel/machine_kexec.c
 new file mode 100644
 index 000..22d185c
 --- /dev/null
 +++ b/arch/arm64/kernel/machine_kexec.c
 @@ -0,0 +1,183 @@
 +/*
 + * kexec for arm64
 + *
 + * Copyright (C) Linaro.
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + */
 +
 +#include linux/kexec.h
 +#include linux/of_fdt.h
 +#include linux/slab.h
 +#include linux/uaccess.h
 +
 +#include asm/cacheflush.h
 +#include asm/system_misc.h
 +
 +/* Global variables for the relocate_kernel routine. */
 +
 +extern const unsigned char relocate_new_kernel[];
 +extern const unsigned long relocate_new_kernel_size;
 +extern unsigned long kexec_dtb_addr;
 +extern unsigned long kexec_kimage_head;
 +extern unsigned long kexec_kimage_start;
 +
 +/**
 + * kexec_list_walk - Helper to walk the kimage page list.
 + */
 +
 +static void kexec_list_walk(void *ctx, unsigned long kimage_head,
 + void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest))
 +{
 + void *dest;
 + unsigned long *entry;

Hi Geoff,

I see only one user of this function, kexec_list_flush_cb(). So why
not directly embed needed logic in kexec_list_flush_cb() instead of
implementing a generic function. It would be simpler as you seem to
be flushing dcache only for SOURCE and IND pages and rest you 
can simply ignore.

 +
 + for (entry = kimage_head, dest = NULL; ; entry++) {
 + unsigned int flag = *entry  
 + (IND_DESTINATION | IND_INDIRECTION | IND_DONE |
 + IND_SOURCE);
 + void *addr = phys_to_virt(*entry  PAGE_MASK);
 +
 + switch (flag) {
 + case IND_INDIRECTION:
 + entry = (unsigned long *)addr - 1;
 + cb(ctx, flag, addr, NULL);
 + break;
 + case IND_DESTINATION:
 + dest = addr;
 + cb(ctx, flag, addr, NULL);
 + break;
 + case IND_SOURCE:
 + cb(ctx, flag, addr, dest);
 + dest += PAGE_SIZE;
 + break;
 + case IND_DONE:
 + cb(ctx, flag , NULL, NULL);
 + return;
 + default:
 + break;
 + }
 + }
 +}
 +
 +/**
 + * kexec_is_dtb - Helper routine to check the device tree header signature.
 + */
 +
 +static bool kexec_is_dtb(const void *dtb)
 +{
 + __be32 magic;
 +
 + return get_user(magic, (__be32 *)dtb) ? false :
 + (be32_to_cpu(magic) == OF_DT_HEADER);
 +}
 +
 +/**
 + * kexec_find_dtb_seg - Helper routine to find the dtb segment.
 + */
 +
 +static const struct kexec_segment *kexec_find_dtb_seg(
 + const struct kimage *image)
 +{
 + int i;
 +
 + for (i = 0; i  image-nr_segments; i++) {
 + if (kexec_is_dtb(image-segment[i].buf))
 + return image-segment[i];
 + }
 +
 + return NULL;
 +}

So this implementation makes passing dtb mandatory. So it will not work
with ACPI?

Where is dtb present? How is it passed to first kernel? Can it still
be around in memory and second kernel can access it?

I mean in ACPI world on x86, all the ACPI info is still present and second
kernel can access it without it being explicitly to second kernel in
memory. Can something similar happen for dtb?

[..]
 +/**
 + * kexec_list_flush_cb - Callback to flush the kimage list to PoC.
 + */
 +
 +static void kexec_list_flush_cb(void *ctx , unsigned int flag,
 + void *addr, void *dest)
  ^^^

Nobody seems to be making use of dest. So why introduce it?

 +{
 + switch (flag) {
 + case IND_INDIRECTION:
 + case IND_SOURCE:
 + __flush_dcache_area(addr, PAGE_SIZE);
 + break;

So what does __flush_dcache_area() do? Flush data caches. IIUC, addr
is virtual address at this point of time. While copying pages and
walking through the list, I am assuming you have switched off page
tables and you are in some kind of 1:1 physical mode. So how did
flushing data caches related to a virtual address help. I guess we
are not even accessing that virtual address now. 
 
[..]
 --- /dev/null
 +++ b/arch/arm64/kernel/relocate_kernel.S
 @@ -0,0 +1,183 @@
 +/*
 + * kexec for arm64
 + *
 + * Copyright (C) Linaro.
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + */
 +
 +#include asm/assembler.h
 +#include asm/kexec.h
 +#include asm/memory.h
 +#include asm/page.h
 +#include asm/proc-macros.S
 +
 +/* The list entry flags. */
 +
 +#define IND_DESTINATION_BIT 0
 +#define

Re: [PATCH 0/7] arm64 kexec kernel patches V3

2014-09-30 Thread Vivek Goyal

On Thu, Sep 25, 2014 at 12:23:26AM +, Geoff Levand wrote:

[..]
 To load a second stage kernel and execute a kexec re-boot on arm64 my patches 
 to
 kexec-tools [2], which have not yet been merged upstream, are needed.
 
 This series does not include some re-work of the spin-table CPU enable method
 that is needed to support it,

Hi Geoff,

How do I figure out if my system has spin table enable method or psci
enable method. Can one change it. I wanted to test your patches.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 6/7] arm64/kexec: Add core kexec support

2014-09-25 Thread Vivek Goyal

On Thu, Sep 25, 2014 at 12:23:27AM +, Geoff Levand wrote:
[..]
 +void machine_kexec(struct kimage *image)
 +{
 + phys_addr_t reboot_code_buffer_phys;
 + void *reboot_code_buffer;
 +
 + BUG_ON(num_online_cpus()  1);
 +
 + kexec_kimage_head = image-head;
 +
 + reboot_code_buffer_phys = page_to_phys(image-control_code_page);
 + reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
 +
 + /*
 +  * Copy relocate_new_kernel to the reboot_code_buffer for use
 +  * after the kernel is shut down.
 +  */
 +
 + memcpy(reboot_code_buffer, relocate_new_kernel,
 + relocate_new_kernel_size);
 +
 + /* Flush the reboot_code_buffer in preparation for its execution. */
 +
 + __flush_dcache_area(reboot_code_buffer, relocate_new_kernel_size);
 +
 + /* Flush the kimage list. */
 +
 + kexec_list_walk(NULL, image-head, kexec_list_flush_cb);
 +
 + pr_info(Bye!\n);
 +
 + /* Disable all DAIF exceptions. */
 + 
 + asm volatile (msr daifset, #0xf : : : memory);
 +
 + soft_restart(reboot_code_buffer_phys);

So what is soft_restart() functionality in arm64?

Looks like it switches to identity mapped page tables and that seems
to be the reason that you are not preparing identity mapped page
tables in kexec code. I am wondering I how do you make sure that once
kexec is swapping pages (putting new kernel's pages to its destination)
at that time these identity page will not be overwritten?

I am assuming that you are jumping to purgatory with paging enabled
and whole of the memory identity mapped.

I am also curious to know what are different entry points arm64
kernel image supports and which one are you using by default.

Thanks
Vivek


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 6/7] arm64/kexec: Add core kexec support

2014-09-25 Thread Vivek Goyal

On Thu, Sep 25, 2014 at 12:02:51PM -0700, Geoff Levand wrote:
 Hi Vivek,
 
 On Thu, 2014-09-25 at 14:28 -0400, Vivek Goyal wrote:
  On Thu, Sep 25, 2014 at 12:23:27AM +, Geoff Levand wrote:
  [..]
   +void machine_kexec(struct kimage *image)
   +{
   + phys_addr_t reboot_code_buffer_phys;
   + void *reboot_code_buffer;
   +
   + BUG_ON(num_online_cpus()  1);
   +
   + kexec_kimage_head = image-head;
   +
   + reboot_code_buffer_phys = page_to_phys(image-control_code_page);
   + reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
   +
   + /*
   +  * Copy relocate_new_kernel to the reboot_code_buffer for use
   +  * after the kernel is shut down.
   +  */
   +
   + memcpy(reboot_code_buffer, relocate_new_kernel,
   + relocate_new_kernel_size);
   +
   + /* Flush the reboot_code_buffer in preparation for its execution. */
   +
   + __flush_dcache_area(reboot_code_buffer, relocate_new_kernel_size);
   +
   + /* Flush the kimage list. */
   +
   + kexec_list_walk(NULL, image-head, kexec_list_flush_cb);
   +
   + pr_info(Bye!\n);
   +
   + /* Disable all DAIF exceptions. */
   + 
   + asm volatile (msr daifset, #0xf : : : memory);
   +
   + soft_restart(reboot_code_buffer_phys);
  
  So what is soft_restart() functionality in arm64?
 
 soft_restart() basically turns off the MMU and data caches, then jumps
 to the address passed to it, reboot_code_buffer_phys here.
  
  Looks like it switches to identity mapped page tables and that seems
  to be the reason that you are not preparing identity mapped page
  tables in kexec code. I am wondering I how do you make sure that once
  kexec is swapping pages (putting new kernel's pages to its destination)
  at that time these identity page will not be overwritten?
  
  I am assuming that you are jumping to purgatory with paging enabled
  and whole of the memory identity mapped.
 
 The identity map is just used to turn off the MMU.  soft_restart() is in
 that identity mapping, and once it shuts off the MMU it jumps to the
 physical address of relocate_kernel, which uses physical addressing to
 do the copy.

Hi Geoff,

Ok, thanks. I think it would be nice if this explanation appears in code
somewhere as a comment.

Being able to turn off MMU, seems to have simplified things.

 
  I am also curious to know what are different entry points arm64
  kernel image supports and which one are you using by default.
 
 The arm64 kernel as a single entry, the start of the image.  See
 Documentation/arm64/booting.txt.

I will go through it.

Thanks
Vivek

 
 -Geoff
 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC] autokdump - automated kdump testsuite

2014-09-23 Thread Vivek Goyal

On Mon, Sep 22, 2014 at 10:53:55PM -0400, CAI Qian wrote:

[..]
  Why not simply let the respective service on the host do this job and
  test only makes sure that kdump service is running. It feels little
  out of place that a test is generating custom initramfs.
 Because not every distro will have a kdump service like Fedora.

So which distro does not have a service? Do we know this or we are
assuming that distributions don't have a service to load/unload
kdump kernel.

[..]
  makedumpfile will reduce the vmcore file size to few hundreds of mega
  bytes on most of the systems. Especially, this is just a test, so
  system will be lightly loaded and vmcore will be small after filtering.
 It probably actually have test cases to heavily loaded the memory before
 dumping.

Your original proposal does not take care of this case either. A test
case could be heavily used memory and if user does not have sufficient
memory to save core, so be it. That test will fail.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC] autokdump - automated kdump testsuite

2014-09-22 Thread Vivek Goyal

On Mon, Sep 22, 2014 at 09:00:00AM -0400, CAI Qian wrote:

 - Original Message -
  From: Vivek Goyal vgo...@redhat.com
  To: CAI Qian caiq...@redhat.com
  Cc: linux-kernel linux-ker...@vger.kernel.org, ltp-list 
  ltp-l...@lists.sourceforge.net, crash-utility
  crash-util...@redhat.com, kexec kexec@lists.infradead.org, kexec 
  kdump redhat mailing list
  kexec-kdump-l...@redhat.com
  Sent: Friday, September 19, 2014 9:22:36 PM
  Subject: Re: [RFC] autokdump - automated kdump testsuite

  On Fri, Sep 19, 2014 at 05:52:25AM -0400, CAI Qian wrote:
   I plan to release an automated kdump testsuite that will be

  So will this be a standalone test suit? Can it be merged with
  something already existing say, LTP.
 Yes, it is likely to be standalone. It won't make use of the LTP
 API, and the LTP kdump test suite is outdated, so there is no
 benefit to continue working over there.

So why make it standalone and not replace the old LTP kdump test suite
with this new one?

   focus on testing kernel and the crash utility. It should work
   for all major distros since it will use none of distro-specific
   stuff, and also support different arches including x86, ARM,
   PPC64 and s390x.

   It does the following:
   1) check if there is a memory reserved for kdump. If not,
  reserve the memory and reboot the system.
   2) once the system is back, load kexec on panic and
  prepare a separate initramfs that including needed
  modules to load a local filesystem and necessary utilities

  So you will write logic to prepare custom initramfs or will rely
  on dracut or some other utility for that.
 I'll probably prepare custom initramfs for the sake of simplicity.

Well, preparing custom initramfs will become very tricky. We used
to do that and finally we switched to dracut.

Why not simply let the respective service on the host do this job and
test only makes sure that kdump service is running. It feels little
out of place that a test is generating custom initramfs. 

  in order to analyse /proc/vmcore in the 2nd kernel.
   3) trigger the system crash using methods like sysrq-c, NMI,
  and panic_on_hung_task etc.
   4) in the 2nd kernel, mount a filesystem and use the crash
  utility to analyse /proc/vmcore. Then, gather the analyse
  logs, serial console output, dmesg etc into the filesystem.

  Why not save core and boot back in first kernel and then analyze.

  Trying to work directly with /proc/vmcore does not test makedumfile
  which everybody uses. Also it will require more memory to be reserved
  and packing crash and debug vmlinux into initramfs.
 The additional memory for vmlinux and the crash utility is predictable
 and manageable, so it can just ask 256M memory reserved before running
 the program. On the other hand, it is not usually feasible to ask
 the systems under testing has enough available disk spaces bigger than
 the memory size.

makedumpfile will reduce the vmcore file size to few hundreds of mega
bytes on most of the systems. Especially, this is just a test, so 
system will be lightly loaded and vmcore will be small after filtering.

If there is not enough space, test fails, period. I don't think there
is any need to try to circumvent that and try to run crash in initramfs.
And in the process we don't test makedumpfile which is very imporatnt
component of this whole process.

IMHO, just rely on systemctl start kdump to generate and load custom
initramfs and save filtered vmcore to root fs by default and alanyze
vmcore post reboot. That will keep things simple.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] Put each per-cpu kdump ELF notes into a single page

2014-09-12 Thread Vivek Goyal

On Fri, Sep 12, 2014 at 12:15:37AM +0200, Petr Tesarik wrote:
 On Thu, 11 Sep 2014 17:16:37 -0400
 Vivek Goyal vgo...@redhat.com wrote:
 
  On Thu, Sep 11, 2014 at 10:43:30PM +0200, Petr Tesarik wrote:
   On Thu, 11 Sep 2014 16:01:10 -0400
   Vivek Goyal vgo...@redhat.com wrote:
   
On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote:
 On architectures that use percpu-vm, the percpu region is not 
 guaranteed
 to be contiguous in physical space.

Petr,

Which are those arches?
   
   All except nommu. Actually, percpu-km will be used instead even on MMU
   if SMP is disabled, but since SMP is pretty standard now, I guess the
   vast majority of all kernels out there is affected. ;-)
  
  Hi Petr,
  
  To make sure I understand it correctly I will just summarize what you
  said.
  
  alloc_percpu() code does not guarantee that an object will be on physically
  contiguous pages if object crosses page boundary. That's why we are forcing
  allocation of object aligned to nearest higher power of two boundary of
  object size and that way object will always be on same page (as long as 
  object
  is not bigger than a page).
  
  Is that a fair summary?
 
 Yes. I might add a note why physically contiguous memory is needed
 here, but maybe it's obvious to anyone dealing with kdump.

I think adding couple of lines to explain why physically contiguous notes
are needed is a good idea. It will not be ovious to anybody new to kdump.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [Patch v3 1/2] kaslr: check if kernel location is changed

2014-09-12 Thread Vivek Goyal

On Fri, Sep 12, 2014 at 11:22:44PM +0800, Baoquan He wrote:
 Function handle_relocations() is used to do the relocations handling
 for i686 and kaslr of x86_64. For 32 bit the relocation handling is
 mandotary to perform. For x86_64 only when kaslr is enabled and a
 random kernel location is chosen successfully the relocation handling
 shound be done. However previous implementation only compared the
 kernel loading address and LOAD_PHYSICAL_ADDR where kernel were
 compiled to run at. This would casue system to be exceptional in
 few conditions like when delta between load address and compiled
 address is bigger than what 32bit signed relocations can handle.
 Also there will be limitations that delta can't be too big otherwise
 kernel text virtual addresses will overflow in module address space.
 
 So in this patch check if kernel location is changed after
 choose_kernel_location() when x86_64. If and only if in x86_64
 and kernel location is changed, we say a kaslr random kernel
 location is chosen, then the relocation handling is needed.
 
 Signed-off-by: Baoquan He b...@redhat.com

I think this patch should make kexec and kdump working with kaslr
enabled (CONFIG_RANDOMIZE_BASE=y).

In case of kdump, we will need to pass nokaslr to make sure kernel
does not move from kexec chosen address.

In case of kexec, I think it should be ok to not pass nokaslr. This
case is no different than any other bootloader.

Hence.
Acked-by: Vivek Goyal vgo...@redhat.com

Thomas D.,

You had reported kexec issues with CONFIG_RANDOMIZE_BASE=y. Does this
patch resolve the issue for you?

Thanks
Vivek

 ---
  arch/x86/boot/compressed/misc.c | 26 ++
  1 file changed, 22 insertions(+), 4 deletions(-)
 
 diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
 index 57ab74d..3bb2a17 100644
 --- a/arch/x86/boot/compressed/misc.c
 +++ b/arch/x86/boot/compressed/misc.c
 @@ -230,8 +230,9 @@ static void error(char *x)
   asm(hlt);
  }
  
 -#if CONFIG_X86_NEED_RELOCS
 -static void handle_relocations(void *output, unsigned long output_len)
 +#ifdef CONFIG_X86_NEED_RELOCS
 +static void handle_relocations(void *output_orig, void *output,
 +unsigned long output_len)
  {
   int *reloc;
   unsigned long delta, map, ptr;
 @@ -239,6 +240,20 @@ static void handle_relocations(void *output, unsigned 
 long output_len)
   unsigned long max_addr = min_addr + output_len;
  
   /*
 + * 32bit always requires relocations to be performed. For x86_64,
 + * relocations need to be performed only if kaslr has chosen a
 + * different load address then kernel was originally loaded at.
 + *
 + * If we are here, either kaslr is not configured in or kaslr is disabled
 + * or kaslr has chosen not to change the load location of kernel. Don't
 + * perform any relocations.
 + */
 +#if CONFIG_X86_64
 + if (output_orig == output)
 + return;
 +#endif
 +
 + /*
* Calculate the delta between where vmlinux was linked to load
* and where it was actually loaded.
*/
 @@ -299,7 +314,8 @@ static void handle_relocations(void *output, unsigned 
 long output_len)
  #endif
  }
  #else
 -static inline void handle_relocations(void *output, unsigned long output_len)
 +static inline void handle_relocations(void *output_orig, void *output,
 +   unsigned long output_len)
  { }
  #endif
  
 @@ -360,6 +376,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, 
 memptr heap,
 unsigned char *output,
 unsigned long output_len)
  {
 + unsigned char *output_orig = output;
 +
   real_mode = rmode;
  
   sanitize_boot_params(real_mode);
 @@ -402,7 +420,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, 
 memptr heap,
   debug_putstr(\nDecompressing Linux... );
   decompress(input_data, input_len, NULL, NULL, output, NULL, error);
   parse_elf(output);
 - handle_relocations(output, output_len);
 + handle_relocations(output_orig, output, output_len);
   debug_putstr(done.\nBooting the kernel.\n);
   return output;
  }
 -- 
 1.8.5.3
 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [Patch v3 1/2] kaslr: check if kernel location is changed

2014-09-12 Thread Vivek Goyal

On Fri, Sep 12, 2014 at 05:56:12PM +0200, Thomas D. wrote:
 Hi,
 
 Vivek Goyal wrote:
  You had reported kexec issues with CONFIG_RANDOMIZE_BASE=y. Does this
  patch resolve the issue for you?
 
 Yup! Tested against kernel-3.16.2.

Thanks. Given this patch is small and should not break anything else, I
think it might make sense to send it to stable too.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] Put each per-cpu kdump ELF notes into a single page

2014-09-11 Thread Vivek Goyal

On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote:
 On architectures that use percpu-vm, the percpu region is not guaranteed
 to be contiguous in physical space.

Petr,

Which are those arches?

 However, fs/proc/vmcore.c expects
 all ELF notes to be contiguous. If the ELF note happens to occupy
 two non-adjacent physical pages, part of the note may be read from an
 incorrect memory location by the kdump kernel, resulting in failure to
 initialize /proc/vmcore (if the content of the following physical page,
 incorrectly interpreted as an ELF note specifies a large number), wrong
 register values or other apparent random memory corruption.
 
 There is currently no mechanism to pass the virtual-to-physical mapping
 of the percpu allocation to the kdump kernel. So, instead, I'm changing
 the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less
 than PAGE_SIZE, aligning the buffer to the nearest higher power of 2
 is enough to make sure that the buffer cannot cross a page boundary,
 effectively ensuring that the whole buffer is contiguous in physical
 space.
 
 Signed-off-by: Petr Tesarik ptesa...@suse.cz
 ---
  kernel/kexec.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/kernel/kexec.c b/kernel/kexec.c
 index 2bee072..cdab59d 100644
 --- a/kernel/kexec.c
 +++ b/kernel/kexec.c
 @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
  static int __init crash_notes_memory_init(void)
  {
   /* Allocate memory for saving cpu registers. */
 - crash_notes = alloc_percpu(note_buf_t);
 + crash_notes = __alloc_percpu(sizeof(note_buf_t),
 +  roundup_pow_of_two(sizeof(note_buf_t)));

I think some of the changelog should show up here as comment in short
form. I don't think it is obvious that why we are using __alloc_percpu()
and why aligning to nearst higher power of 2 is needed here. Please also
mention here which arches run into issues.

Thanks
Vivek

   if (!crash_notes) {
   pr_warn(Kexec: Memory allocation for saving cpu register 
 states failed\n);
   return -ENOMEM;
 -- 
 1.8.4.5
 
 ___
 kexec mailing list
 kexec@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] Put each per-cpu kdump ELF notes into a single page

2014-09-11 Thread Vivek Goyal

On Thu, Sep 11, 2014 at 10:43:30PM +0200, Petr Tesarik wrote:
 On Thu, 11 Sep 2014 16:01:10 -0400
 Vivek Goyal vgo...@redhat.com wrote:
 
  On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote:
   On architectures that use percpu-vm, the percpu region is not guaranteed
   to be contiguous in physical space.
  
  Petr,
  
  Which are those arches?
 
 All except nommu. Actually, percpu-km will be used instead even on MMU
 if SMP is disabled, but since SMP is pretty standard now, I guess the
 vast majority of all kernels out there is affected. ;-)

Hi Petr,

To make sure I understand it correctly I will just summarize what you
said.

alloc_percpu() code does not guarantee that an object will be on physically
contiguous pages if object crosses page boundary. That's why we are forcing
allocation of object aligned to nearest higher power of two boundary of
object size and that way object will always be on same page (as long as object
is not bigger than a page).

Is that a fair summary?

Thanks
Vivek
 
 
   However, fs/proc/vmcore.c expects
   all ELF notes to be contiguous. If the ELF note happens to occupy
   two non-adjacent physical pages, part of the note may be read from an
   incorrect memory location by the kdump kernel, resulting in failure to
   initialize /proc/vmcore (if the content of the following physical page,
   incorrectly interpreted as an ELF note specifies a large number), wrong
   register values or other apparent random memory corruption.
   
   There is currently no mechanism to pass the virtual-to-physical mapping
   of the percpu allocation to the kdump kernel. So, instead, I'm changing
   the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less
   than PAGE_SIZE, aligning the buffer to the nearest higher power of 2
   is enough to make sure that the buffer cannot cross a page boundary,
   effectively ensuring that the whole buffer is contiguous in physical
   space.
   
   Signed-off-by: Petr Tesarik ptesa...@suse.cz
   ---
kernel/kexec.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
   
   diff --git a/kernel/kexec.c b/kernel/kexec.c
   index 2bee072..cdab59d 100644
   --- a/kernel/kexec.c
   +++ b/kernel/kexec.c
   @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
static int __init crash_notes_memory_init(void)
{
 /* Allocate memory for saving cpu registers. */
   - crash_notes = alloc_percpu(note_buf_t);
   + crash_notes = __alloc_percpu(sizeof(note_buf_t),
   +  roundup_pow_of_two(sizeof(note_buf_t)));
  
  I think some of the changelog should show up here as comment in short
  form. I don't think it is obvious that why we are using __alloc_percpu()
  and why aligning to nearst higher power of 2 is needed here. Please also
  mention here which arches run into issues.
 
 OK, I'll add it as a comment in the code. I'll see if I can make it
 short but still understandable.
 
 Thanks,
 Petr Tesarik
 
  Thanks
  Vivek
  
 if (!crash_notes) {
 pr_warn(Kexec: Memory allocation for saving cpu register 
   states failed\n);
 return -ENOMEM;
   -- 
   1.8.4.5
   
   ___
   kexec mailing list
   kexec@lists.infradead.org
   http://lists.infradead.org/mailman/listinfo/kexec
  
  ___
  kexec mailing list
  kexec@lists.infradead.org
  http://lists.infradead.org/mailman/listinfo/kexec

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

1 2 3 4 5 6 7 8 9 >

1 - 100 of 858 matches

Mail list logo