On 17/11/23 11:01 am, Aneesh Kumar K V wrote:
On 11/17/23 10:03 AM, Sourabh Jain wrote:
Hi Aneesh,

Thanks for reviewing the patch.

On 15/11/23 10:14, Aneesh Kumar K.V wrote:
Sourabh Jain<sourabhj...@linux.ibm.com>  writes:

....

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h
index 27f9e11eda28..7be3d8894520 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -42,7 +42,25 @@ static inline u64 fadump_str_to_u64(const char *str)
     #define FADUMP_CPU_UNKNOWN        (~((u32)0))
   -#define FADUMP_CRASH_INFO_MAGIC        fadump_str_to_u64("FADMPINF")
+/*
+ * The introduction of new fields in the fadump crash info header has
+ * led to a change in the magic key, from `FADMPINF` to `FADMPSIG`.
+ * This alteration ensures backward compatibility, enabling the kernel
+ * with the updated fadump crash info to handle kernel dumps from older
+ * kernels.
+ *
+ * To prevent the need for further changes to the magic number in the
+ * event of future modifications to the fadump header, a version field
+ * has been introduced to track the fadump crash info header version.
+ *
+ * Historically, there was no connection between the magic number and
+ * the fadump crash info header version. However, moving forward, the
+ * `FADMPINF` magic number in header will be treated as version 0, while
+ * the `FADMPSIG` magic number in header will include a version field to
+ * determine its version.
+ */
+#define FADUMP_CRASH_INFO_MAGIC        fadump_str_to_u64("FADMPSIG")
+#define FADUMP_VERSION            1

Can we keep the old magic details as

#define FADUMP_CRASH_INFO_MAGIC_OLD        fadump_str_to_u64("FADMPINF")
#define FADUMP_CRASH_INFO_MAGIC        fadump_str_to_u64("FADMPSIG")

Sure.

Also considering the struct need not be backward compatible, can we just
do

struct fadump_crash_info_header {
     u64        magic_number;
     u32        crashing_cpu;
     u64        elfcorehdr_addr;
     u64        elfcorehdr_size;
     u64        vmcoreinfo_raddr;
     u64        vmcoreinfo_size;
     struct pt_regs    regs;
     struct cpumask    cpu_mask;
};
static inline bool fadump_compatible(struct fadump_crash_info_header
*fdh)
{
     return (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC)
}

and fail fadump if we find it not compatible?

Agree that it is unsafe to collect a dump with an incompatible fadump crash 
info header.

Given that I am updating the fadump crash info header, we can make a few 
arrangements
like adding a size filed for the dynamic size attribute like pt_regs and 
cpumask to ensure
better compatibility in the future.

Additionally, let's introduce a version field to the fadump crash info header 
to avoid changing
the magic number in the future.


I am not sure whether we need to add all the complexity to enable supporting 
different fadump kernel
version. Is that even a possible use case with fadump? Can't we always assume 
that with fadump the
crash kernel and fadump kernel will be same version? if yes we can simply fail 
with a magic number
mismatch because that indicates an user config error?

If we decide not to support different kernel versions for production
kernel and capture kernel, We can make that implicit by adding kernel
version info of production kernel in the header and bailing out if
there is kernel version mismatch as magic could still match for two
different kernel versions.

I would personally prefer something like the below though:

        struct fadump_crash_info_header {
                u64             magic_number;
                u32             version
                u32             crashing_cpu;
                u64             elfcorehdr_addr;
                u64             elfcorehdr_size;
                u64             vmcoreinfo_raddr;
                u64             vmcoreinfo_size;
                u8              kernel_version[];
                u32             pt_regs_sz;
                struct pt_regs  regs;
                u32             cpu_mask_sz;
                struct cpumask  cpu_mask;
        };

        if (magic_number != new_magic)
                goto err;       /* Error out */

        if (kernel_version != capture_kernel_version)
        {
if (cpu_mask_sz == sizeof(struct pt_regs) && cpu_mask_sz == sizeof(struct cpumask))
                        /*
                         * Warn about the kernel version mismatch and how data 
can be different
                         * across kernel versions and proceed anyway!
                         */
                else
                        goto err;       /* Error out */
        }

This ensures we warn and proceed in cases where it is less likely to
have issues capturing kernel dump. This helps in dev environment where
we are trying to debug an early boot crash - in which case capture
kernel can't be the same kernel as it would likely hit the same problem
while booting..

Thanks
Hari

Reply via email to