Hello Lianbo,

I wanted to keep the story short, but here is the long version.

This problem has been hiding for a long time due to a compiler alignment 
padding on 64bit machine.

For example, on RHEL 8.0, “include/linux/slub_def.h?v=4.18.0-80”:

0101         void (*ctor)(void *);
0102         unsigned int inuse;             /* Offset to metadata */
0103         unsigned int align;             /* Alignment */
0104         unsigned int red_left_pad;      /* Left redzone padding size */
0105         const char *name;       /* Name (only for display!) */

We have 3 “unsigned int” followed by “char*name” (64bit aligned), hence the 
compiler will add a 32bit padding after red_left_pad.
With such padding (and a on little-endian machine, ie. x86_64), a bogus 
load_ULONG() instead of a load_UINT() can’t really be noticed if the padding is 
zero-filled.

Now, on releases where we don’t have such compiler padding after red_left_pad, 
a bogus load_ULONG() instead of a load_UINT()
will produce noticeable garbage.

For example, on RHEL 7.8, “include/linux/slub_def.h?v=3.10.0-1127”:

0084         void (*ctor)(void *);
0085         int inuse;              /* Offset to metadata */
0086         int align;              /* Alignment */
0087         int reserved;           /* Reserved bytes at the end of slabs */
0088         RH_KABI_FILL_HOLE(int red_left_pad)     /* Left redzone padding 
size */
0089         const char *name;       /* Name (only for display!) */

Here there won’t be any 32bit padding after red_left_pad as we have 4 “int” 
before “char*name”,
so here a load_ULONG() instead of load_UINT() we will get bogus data.

From a RHEL 7.8 crash dump, where the machine was booted with “slub_debug=FZP” 
(ie. Debugging all caches):

crash> sys | grep REL
     RELEASE: 3.10.0-1127.el7.x86_64
crash> kmem -S names_cache
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
ffff8838ffc2c040     4096          1       336     48    32k  names_cache
[…]
KMEM_CACHE_NODE   NODE  SLABS  PARTIAL  PER-CPU
ffff88017fdcfcc0     0      6        6        0
NODE 0 PARTIAL:
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffea0021799000  ffff88085e640000     0      7          0     7
  ffffea002179a600  ffff88085e698000     0      7          0     7
  ffffea00217fce00  ffff88085ff38000     0      7          0     7
  ffffea00216df000  ffff88085b7c0000     0      7          0     7
  ffffea00216dc400  ffff88085b710000     0      7          0     7
  ffffea0021799c00  ffff88085e670000     0      7          0     7
[…]
crash> struct kmem_cache ffff8838ffc2c040 | grep red
  red_left_pad = 64,
crash> kmem -s ffff88085e640040      <<< trying first object, should be free
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
ffff8838ffc2c040     4096          1       336     48    32k  names_cache
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffea0021799000  ffff88085e640000     0      7          0     7
  FREE / [ALLOCATED]
  [ffff88085e640000]     <<< getting expected red zone address, but not free !!
crash> set redzone off
redzone: off
crash> kmem -s ffff88085e640040    <<< trying again with redzone off
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
ffff8838ffc2c040     4096          1       336     48    32k  names_cache
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffea0021799000  ffff88085e640000     0      7          0     7
  FREE / [ALLOCATED]
  [ffc128105e640040]    <<< getting a bogus object address, still not free
crash> rd -32 0xffff8838ffc2c09c
ffff8838ffc2c09c:  00000040                              @...
crash> rd -64 0xffff8838ffc2c09c
ffff8838ffc2c09c:  ffc1a00800000040                    @.......
crash> p /x 0xffff88085e640000+0xffc1a00800000040
$1 = 0xffc128105e640040

Here above we can see that red_left_pad was read by crash as 0xffc1a00800000040 
instead of 0x40 !!

With the patch:

crash> kmem -s ffff88085e640040
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
ffff8838ffc2c040     4096          1       336     48    32k  names_cache
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffea0021799000  ffff88085e640000     0      7          0     7
  FREE / [ALLOCATED]
   ffff88085e640000   <<< getting expected red zone address, and object free
crash> set redzone off
redzone: off
crash> kmem -s ffff88085e640040
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
ffff8838ffc2c040     4096          1       336     48    32k  names_cache
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffea0021799000  ffff88085e640000     0      7          0     7
  FREE / [ALLOCATED]
   ffff88085e640040  <<< getting expected object address, and object free

Now, with respect to:
> And also please add the kernel commit to patch log:
> kernel commit 2ca6d39b3102 ("slub: make ->red_left_pad unsigned int")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca6d39b31022bb9e1dda77109e292517f701261

I don’t agree with this suggestion, this problem has nothing to do with “int” 
versus “unsigned int”.
It is only related to doing a load_ULONG() (64bit) instead of a load_INT() or 
load_UINT() (32bit).
For some releases, where we have a 32bit padding after red_left_pad, the 
problem won’t be noticed.
But without a 32bit padding after red_left_pad, crash will get a bogus 
red_left_pad causing all objects
to be reported as [ALLOCATED], and it will print bogus object addresses when 
using “set redzone off”.

Cheers,
Georges



From: lijiang <liji...@redhat.com>
Sent: Wednesday, September 4, 2024 5:13 AM
To: devel@lists.crash-utility.osci.io
Cc: Aureau, Georges (Kernel Tools ERT) <georges.aur...@hpe.com>
Subject: Re: [PATCH] “kmem address” not working properly when redzone is enabled

Hi, Aureau
Thank you for the fix.

On Thu, Aug 29, 2024 at 5:56 PM 
<devel-requ...@lists.crash-utility.osci.io<mailto:devel-requ...@lists.crash-utility.osci.io>>
 wrote:
Date: Thu, 29 Aug 2024 09:15:36 +0000
From: "Aureau, Georges (Kernel Tools ERT)" 
<georges.aur...@hpe.com<mailto:georges.aur...@hpe.com>>
Subject: [Crash-utility][PATCH] “kmem address” not working properly
        when redzone is enabled
To: 
"devel@lists.crash-utility.osci.io<mailto:devel@lists.crash-utility.osci.io>"
        
<devel@lists.crash-utility.osci.io<mailto:devel@lists.crash-utility.osci.io>>
Message-ID: <sj0pr84mb1482e72f9e168b3b0ce885c89f...@sj0pr84mb1482.namp
        RD84.PROD.OUTLOOK.COM<http://RD84.PROD.OUTLOOK.COM>>
Content-Type: text/plain; charset="Windows-1252"

Crash “kmem address” not working properly when redzone is enabled.

When "slub_debug" is enabled with redzoning, "kmem address" does not work 
properly.
The "red_left_pad" member within "struct kmem_cache" is currently an "unsigned 
int",
it used to be an "int", but it never was a "long", hence "red_left_pad" in 
do_slab_slub()
was not initialized properly. This "red_left_pad" issue resulted in reporting 
free objects
as "[ALLOCATED]", and in reporting bogus object addresses when using "set 
redzone off".

Can you help add the result of the 'kmem address' command here? We can clearly 
see what error it is.

And also please add the kernel commit to patch log:
kernel commit 2ca6d39b3102 ("slub: make ->red_left_pad unsigned int")

Signed-off-by: Georges Aureau 
<georges.aur...@hpe.com<mailto:georges.aur...@hpe.com>>
--
diff --git a/memory.c b/memory.c
index a74ebaf..967a9cf 100644
--- a/memory.c
+++ b/memory.c
@@ -19637,7 +19637,8 @@ do_slab_slub(struct meminfo *si, int verbose)
        int i, free_objects, cpu_slab, is_free, node;
        ulong p, q;
 #define SLAB_RED_ZONE 0x00000400UL
-       ulong flags, red_left_pad;
+       ulong flags;
+       uint red_left_pad;

        if (!si->slab) {
                if (CRASHDEBUG(1))
@@ -19727,7 +19728,7 @@ do_slab_slub(struct meminfo *si, int verbose)
        if (VALID_MEMBER(kmem_cache_red_left_pad)) {
                flags = ULONG(si->cache_buf + OFFSET(kmem_cache_flags));
                if (flags & SLAB_RED_ZONE)
-                       red_left_pad = ULONG(si->cache_buf + 
OFFSET(kmem_cache_red_left_pad));
+                       red_left_pad = UINT(si->cache_buf + 
OFFSET(kmem_cache_red_left_pad));
        }

This change looks good to me, but I still have a question:
I can not reproduce the current issue, how did you reproduce this one? Can you 
help list the steps to reproduce?


Thanks
Lianbo

        for (p = vaddr; p < vaddr + objects * si->size; p += si->size) {

------------------------------
--
Crash-utility mailing list -- devel@lists.crash-utility.osci.io
To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

Reply via email to