Hello Aneesh,

On 22/11/23 17:50, Aneesh Kumar K V wrote:
On 11/22/23 4:05 PM, Sourabh Jain wrote:
Hello Michael,


On 22/11/23 10:47, Michael Ellerman wrote:
Aneesh Kumar K V <aneesh.ku...@linux.ibm.com> writes:
...
I am not sure whether we need to add all the complexity to enable supporting 
different fadump kernel
version. Is that even a possible use case with fadump? Can't we always assume 
that with fadump the
crash kernel and fadump kernel will be same version?
How sure are we of that?

Don't we go through grub when we boot into the 2nd kernel. And so
couldn't it choose to boot a different kernel, for whatever reason.

I don't think we need to support different pt_reg / cpumask sizes, but
requiring the exact same kernel version is too strict I think.
Agree.
But maybe I'm wrong. Would be good to hear what distro folks think.
How about checking fadump crash info header compatibility in the following way?

static bool is_fadump_header_compatible(struct fadump_crash_info_header *fdh)
{
     if (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC_OLD) {
         pr_err("Old magic number, can't process the dump.");
         return false;
     }

     if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) {
         pr_err("Fadump header is corrupted.");
         return false;
     }

     /*
      * If the kernel version of the first/crashed kernel and the second/fadump
      * kernel is not same, then only collect the dump if the size of all
      * non-primitive type members of the fadump header is the same across 
kernels.
      */
     if (strcmp(fdh->kernel_version, init_uts_ns.name.release)) {
         if (fdh->pt_regs_sz != sizeof(struct pt_regs) || fdh->cpu_mask_sz != 
sizeof(struct cpumask)) {
             pr_err("Fadump header size mismatch.\n")
             return false;
         } else
             pr_warn("Kernel version mismatch; dump data is unreliable.\n");
     }

You also want a fdh->version check?

Even though we don't have any action against an fdh->version right now, I think I should check the fadump header version. Currently, if the version doesn't match, it means the
header is corrupted.

I am still not sure you need the kernel_version check. IMHO that is too strict
and can hit that check where you have kexec kernel which is not installed in 
/boot crashing?

If the kernel versions mismatch, we still collect the dump if the `pt_regs` and `cpu_mask` sizes are the same across the kernels. The kernel version check is just to warn users that
the collected dump may be unreliable.

Should I remove the kernel_version filed from fadump crash info header and remove the
the kernel version check while processing the kernel dump?

Thanks,
Sourabh Jain

Reply via email to