On Wed, Sep 4, 2024 at 7:41 PM Aureau, Georges (Kernel Tools ERT) <
georges.aur...@hpe.com> wrote:

> Hello Lianbo,
>
>
>
> I wanted to keep the story short, but here is the long version.
>
>
>
> This problem has been hiding for a long time due to a compiler alignment
> padding on 64bit machine.
>
>
>

Thanks for the explanation in detail, Georges. I have to say that is a good
finding.


> For example, on RHEL 8.0, “include/linux/slub_def.h?v=4.18.0-80”:
>
>
>
> 0101         void (*ctor)(void *);
>
> 0102         unsigned int inuse;             /* Offset to metadata */
>
> 0103         unsigned int align;             /* Alignment */
>
> 0104         unsigned int red_left_pad;      /* Left redzone padding size
> */
>
> 0105         const char *name;       /* Name (only for display!) */
>
>
>
> We have 3 “unsigned int” followed by “char*name” (64bit aligned), hence
> the compiler will add a 32bit padding after red_left_pad.
>
> With such padding (and a on little-endian machine, ie. x86_64), a bogus
> load_ULONG() instead of a load_UINT() can’t really be noticed if the
> padding is zero-filled.
>
>
>

Ah, looks like the current issue can not be *ALWAYS* reproduced.


> Now, on releases where we don’t have such compiler padding after
> red_left_pad, a bogus load_ULONG() instead of a load_UINT()
>
> will produce noticeable garbage.
>
>
>
> For example, on RHEL 7.8, “include/linux/slub_def.h?v=3.10.0-1127”:
>
>
>
> 0084         void (*ctor)(void *);
>
> 0085         int inuse;              /* Offset to metadata */
>
> 0086         int align;              /* Alignment */
>
> 0087         int reserved;           /* Reserved bytes at the end of
> slabs */
>
> 0088         RH_KABI_FILL_HOLE(int red_left_pad)     /* Left redzone
> padding size */
>
> 0089         const char *name;       /* Name (only for display!) */
>
>
>
> Here there won’t be any 32bit padding after red_left_pad as we have 4
> “int” before “char*name”,
>
> so here a load_ULONG() instead of load_UINT() we will get bogus data.
>
>
>
> From a RHEL 7.8 crash dump, where the machine was booted with
> “slub_debug=FZP” (ie. Debugging all caches):
>
>
>
> crash> sys | grep REL
>
>      RELEASE: 3.10.0-1127.el7.x86_64
>
> crash> kmem -S names_cache
>
> CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
>
> ffff8838ffc2c040     4096          1       336     48    32k  names_cache
>
> […]
>
> KMEM_CACHE_NODE   NODE  SLABS  PARTIAL  PER-CPU
>
> ffff88017fdcfcc0     0      6        6        0
>
> NODE 0 PARTIAL:
>
>   SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
>
>   ffffea0021799000  ffff88085e640000     0      7          0     7
>
>   ffffea002179a600  ffff88085e698000     0      7          0     7
>
>   ffffea00217fce00  ffff88085ff38000     0      7          0     7
>
>   ffffea00216df000  ffff88085b7c0000     0      7          0     7
>
>   ffffea00216dc400  ffff88085b710000     0      7          0     7
>
>   ffffea0021799c00  ffff88085e670000     0      7          0     7
>
> […]
>
> crash> struct kmem_cache ffff8838ffc2c040 | grep red
>
>   red_left_pad = 64,
>
> crash> kmem -s ffff88085e640040      <<< trying first object, should be
> free
>
> CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
>
> ffff8838ffc2c040     4096          1       336     48    32k  names_cache
>
>   SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
>
>   ffffea0021799000  ffff88085e640000     0      7          0     7
>
>   FREE / [ALLOCATED]
>
>   [ffff88085e640000]     <<< getting expected red zone address, but not
> free !!
>
> crash> set redzone off
>
> redzone: off
>
> crash> kmem -s ffff88085e640040    <<< trying again with redzone off
>
> CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
>
> ffff8838ffc2c040     4096          1       336     48    32k  names_cache
>
>   SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
>
>   ffffea0021799000  ffff88085e640000     0      7          0     7
>
>   FREE / [ALLOCATED]
>
>   [ffc128105e640040]    <<< getting a bogus object address, still not free
>
> crash> rd -32 0xffff8838ffc2c09c
>
> ffff8838ffc2c09c:  00000040                              @...
>
> crash> rd -64 0xffff8838ffc2c09c
>
> ffff8838ffc2c09c:  ffc1a00800000040                    @.......
>
> crash> p /x 0xffff88085e640000+0xffc1a00800000040
>
> $1 = 0xffc128105e640040
>
>
>
> Here above we can see that red_left_pad was read by crash as
> 0xffc1a00800000040 instead of 0x40 !!
>
>
>
> With the patch:
>
>
>
> crash> kmem -s ffff88085e640040
>
> CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
>
> ffff8838ffc2c040     4096          1       336     48    32k  names_cache
>
>   SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
>
>   ffffea0021799000  ffff88085e640000     0      7          0     7
>
>   FREE / [ALLOCATED]
>
>    ffff88085e640000   <<< getting expected red zone address, and object
> free
>
> crash> set redzone off
>
> redzone: off
>
> crash> kmem -s ffff88085e640040
>
> CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
>
> ffff8838ffc2c040     4096          1       336     48    32k  names_cache
>
>   SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
>
>   ffffea0021799000  ffff88085e640000     0      7          0     7
>
>   FREE / [ALLOCATED]
>
>    ffff88085e640040  <<< getting expected object address, and object free
>
>
Got it, wonderful.


>
>
> Now, with respect to:
>
> > And also please add the kernel commit to patch log:
>
> > kernel commit 2ca6d39b3102 ("slub: make ->red_left_pad unsigned int")
>
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca6d39b31022bb9e1dda77109e292517f701261
>
>
>
> I don’t agree with this suggestion, this problem has nothing to do with
> “int” versus “unsigned int”.
>
> It is only related to doing a load_ULONG() (64bit) instead of a load_INT()
> or load_UINT() (32bit).
>
> For some releases, where we have a 32bit padding after red_left_pad, the
> problem won’t be noticed.
>
> But without a 32bit padding after red_left_pad, crash will get a bogus
> red_left_pad causing all objects
>
> to be reported as [ALLOCATED], and it will print bogus object addresses
> when using “set redzone off”.
>

It's true that the current issue is irrelevant to the kernel commit
changes. I misunderstood your patch log:

"...The "red_left_pad" member within "struct kmem_cache" is currently an
"unsigned int",
it used to be an "int", but it never was a "long...""

I have no other issues, so for the patch: Ack.

Thanks
Lianbo


>
>
> Cheers,
>
> Georges
>
>
>
>
>
>
>
> *From:* lijiang <liji...@redhat.com>
> *Sent:* Wednesday, September 4, 2024 5:13 AM
> *To:* devel@lists.crash-utility.osci.io
> *Cc:* Aureau, Georges (Kernel Tools ERT) <georges.aur...@hpe.com>
> *Subject:* Re: [PATCH] “kmem address” not working properly when redzone
> is enabled
>
>
>
> Hi, Aureau
>
> Thank you for the fix.
>
>
>
> On Thu, Aug 29, 2024 at 5:56 PM <devel-requ...@lists.crash-utility.osci.io>
> wrote:
>
> Date: Thu, 29 Aug 2024 09:15:36 +0000
> From: "Aureau, Georges (Kernel Tools ERT)" <georges.aur...@hpe.com>
> Subject: [Crash-utility][PATCH] “kmem address” not working properly
>         when redzone is enabled
> To: "devel@lists.crash-utility.osci.io"
>         <devel@lists.crash-utility.osci.io>
> Message-ID: <sj0pr84mb1482e72f9e168b3b0ce885c89f...@sj0pr84mb1482.namp
>         RD84.PROD.OUTLOOK.COM>
> Content-Type: text/plain; charset="Windows-1252"
>
> Crash “kmem address” not working properly when redzone is enabled.
>
> When "slub_debug" is enabled with redzoning, "kmem address" does not work
> properly.
> The "red_left_pad" member within "struct kmem_cache" is currently an
> "unsigned int",
> it used to be an "int", but it never was a "long", hence "red_left_pad" in
> do_slab_slub()
> was not initialized properly. This "red_left_pad" issue resulted in
> reporting free objects
> as "[ALLOCATED]", and in reporting bogus object addresses when using "set
> redzone off".
>
>
>
> Can you help add the result of the 'kmem address' command here? We can
> clearly see what error it is.
>
>
>
> And also please add the kernel commit to patch log:
>
> kernel commit 2ca6d39b3102 ("slub: make ->red_left_pad unsigned int")
>
>
>
> Signed-off-by: Georges Aureau <georges.aur...@hpe.com>
> --
> diff --git a/memory.c b/memory.c
> index a74ebaf..967a9cf 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -19637,7 +19637,8 @@ do_slab_slub(struct meminfo *si, int verbose)
>         int i, free_objects, cpu_slab, is_free, node;
>         ulong p, q;
>  #define SLAB_RED_ZONE 0x00000400UL
> -       ulong flags, red_left_pad;
> +       ulong flags;
> +       uint red_left_pad;
>
>         if (!si->slab) {
>                 if (CRASHDEBUG(1))
> @@ -19727,7 +19728,7 @@ do_slab_slub(struct meminfo *si, int verbose)
>         if (VALID_MEMBER(kmem_cache_red_left_pad)) {
>                 flags = ULONG(si->cache_buf + OFFSET(kmem_cache_flags));
>                 if (flags & SLAB_RED_ZONE)
> -                       red_left_pad = ULONG(si->cache_buf +
> OFFSET(kmem_cache_red_left_pad));
> +                       red_left_pad = UINT(si->cache_buf +
> OFFSET(kmem_cache_red_left_pad));
>         }
>
>
>
> This change looks good to me, but I still have a question:
>
> I can not reproduce the current issue, how did you reproduce this one? Can
> you help list the steps to reproduce?
>
>
>
>
>
> Thanks
>
> Lianbo
>
>
>
>         for (p = vaddr; p < vaddr + objects * si->size; p += si->size) {
>
> ------------------------------
>
>
--
Crash-utility mailing list -- devel@lists.crash-utility.osci.io
To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

Reply via email to