Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

boris . ostrovsky Tue, 29 Sep 2020 12:11:57 -0700

+Lennart


On 9/29/20 9:36 AM, Philipp Rudo wrote:
> Hi,
>
> On Fri, 25 Sep 2020 10:56:25 -0400
> Konrad Rzeszutek Wilk <konrad.w...@oracle.com> wrote:
>
>> On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote:
>>> Hi,
>>>
>>> On 09/24/20 at 01:16pm, boris.ostrov...@oracle.com wrote:  
>>>> On 9/24/20 12:43 PM, Michael Kelley wrote:  
>>>>> From: Eric W. Biederman <ebied...@xmission.com> Sent: Thursday, September 
>>>>> 24, 2020 9:26 AM  
>>>>>> Michael Kelley <mikel...@microsoft.com> writes:
>>>>>>  
>>>>>>>>> Added Hyper-V people and people who created the param, it is below
>>>>>>>>> commit, I also want to remove it if possible, let's see how people
>>>>>>>>> think, but the least way should be to disable the auto setting in 
>>>>>>>>> both systemd
>>>>>>>>> and kernel:  
>>>>>>> Hyper-V uses a notifier to inform the host system that a Linux VM has
>>>>>>> panic'ed.  Informing the host is particularly important in a public 
>>>>>>> cloud
>>>>>>> such as Azure so that the cloud software can alert the customer, and can
>>>>>>> track cloud-wide reliability statistics.   Whether a kdump is taken is 
>>>>>>> controlled
>>>>>>> entirely by the customer and how he configures the VM, and we want
>>>>>>> the host to be informed either way.  
>>>>>> Why?
>>>>>>
>>>>>> Why does the host care?
>>>>>> Especially if the VM continues executing into a kdump kernel?  
>>>>> The host itself doesn't care.  But the host is a convenient out-of-band
>>>>> channel for recording that a panic has occurred and to collect basic data
>>>>> about the panic.  This out-of-band channel is then used to notify the end
>>>>> customer that his VM has panic'ed.  Sure, the customer should be running
>>>>> his own monitoring software, but customers don't always do what they
>>>>> should.  Equally important, the out-of-band channel allows the cloud
>>>>> infrastructure software to notice trends, such as that the rate of Linux
>>>>> panics has increased, and that perhaps there is a cloud problem that
>>>>> should be investigated.  
>>>>
>>>> In many cases (especially in cloud environment) your dump device is remote 
>>>> (e.g. iscsi) and kdump sometimes (often?) gets stuck because of 
>>>> connectivity issues (which could be cause of the panic in the first 
>>>> place). So it is quite desirable to inform the infrastructure that the VM 
>>>> is on its way out without waiting for kdump to complete.  
>>> That can probably be done in kdump kernel if it is really needed.  Say
>>> informing host that panic happened and a kdump kernel is runnning.  
>> If kdump kernel gets to that point. Sometimes (sadly) it ends up being
>> misconfigured and it chokes up - and hence having multiple ways to emit
>> the crash information before running kdump kernel is a life-saver.
>>
>>> But I think to set crash_kexec_post_notifiers by default is still bad.   
>> Because of the way it is run today I presume? If there was some
>> safe/unsafe policy that should work right? I would think that the
>> safe ones that work properly all the time are:
>>
>>  - HyperV CRASH_MSRs,
>>  - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob,
>>  - pstore EFI variables
>>  - Dumping in memory,
>>
>> And then some that depend on firmware version (aka BIOS, and vendor) are:
>>  - ACPI ERST,
>>
>> And then the unsafe:
>>  - s390, PowerPC (I don't actually know what they are but that
>>     was Dave's primary motivator).
> that won't work on s390. Let me emphasize that the problems on s390 are not 
> the
> notifiers themselves but the fact that they are called before crash_kexec.
>
> On s390 we have multiple dump methods besides kdump. We use a panic notifier 
> to
> trigger these dump methods from the panicking kernel. The problem is that 
> these
> dump methods are less powerful than kdump so we only want to use them as
> fallback, i.e. only use them when either kdump wasn't configured or loading of
> the crash kernel failed for whatever reason. That's why (plus historic 
> reasons)
> our notifier stops the machine when it is called and none of the methods is
> configured. Which means that the second crash_kexec is never reached.
>
> Long story short, the problem on s390 is caused by the two hunks in
> kernel/panic.c:panic from f06e5153f4ae ("kernel/panic.c: add
> "crash_kexec_post_notifiers" option for kdump after panic_notifers").
>
> Besides the problems on s390 I support Dave and think that setting
> crash_kexec_post_notifiers by default is wrong. We should keep in mind that
> we are in a panic situation. This means that the kernel is in a state where it
> doesn't trust itself anymore. So we should keep the code that is run to the
> bare minimum as we cannot rely on it to work properly.


There is a pending patch to revert notifiers' default in systemd: 
https://github.com/systemd/systemd/pull/16950


If this change goes through then Dave's patch will be unnecessary.


-boris



>
> Thanks
> Philipp
>
>>>   
>>>>   
>>>>>  
>>>>>> Further like I have mentioned everytime something like this has come up
>>>>>> a call on the kexec on panic code path should be a direct call (That can
>>>>>> be audited) not something hidden in a notifier call chain (which can 
>>>>>> not).
>>>>>>  
>>>> We btw already have a direct call from panic() to kmsg_dump() which is 
>>>> indirectly controlled by crash_kexec_post_notifiers, and it would also be 
>>>> preferable to be able to call it before kdump as well.  
>>> Right, that is the same thing we are talking about.
>>>
>>> Thanks
>>> Dave
>>>   
>> _______________________________________________
>> kexec mailing list
>> ke...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

Reply via email to