On Wed, Jul 17, 2024 at 12:01 PM Tao Liu <l...@redhat.com> wrote:

> Hi Lianbo,
>
> On Wed, Jul 17, 2024 at 2:24 PM lijiang <liji...@redhat.com> wrote:
> >
> > On Wed, Jul 17, 2024 at 6:01 AM Tao Liu <l...@redhat.com> wrote:
> >>
> >> Hi Lianbo,
> >>
> >> On Tue, Jul 16, 2024 at 11:19 AM Tao Liu <l...@redhat.com> wrote:
> >> >
> >> > Hi Lianbo,
> >> >
> >> > On Fri, Jul 12, 2024 at 6:32 PM Lianbo Jiang <liji...@redhat.com>
> wrote:
> >> > >
> >> > > Hi, Tao
> >> > >
> >> > > On 7/5/24 9:26 AM, devel-requ...@lists.crash-utility.osci.io wrote:
> >> > > > Date: Thu,  4 Jul 2024 17:00:56 +1200
> >> > > > From: Tao Liu<l...@redhat.com>
> >> > > > Subject: [Crash-utility] [PATCH] Fix "irq -a" exceeding the memory
> >> > > >       range issue
> >> > > > To:devel@lists.crash-utility.osci.io
> >> > > > Cc: Tao Liu<l...@redhat.com>
> >> > > > Message-ID:<20240704050056.17375-1-l...@redhat.com>
> >> > > > Content-Type: text/plain; charset="US-ASCII"; x-default=true
> >> > > >
> >> > > > Previously without the patch, there was an error observed as
> follows:
> >> > > >
> >> > > > crash> irq -a
> >> > > > IRQ NAME                 AFFINITY
> >> > > >    0 timer                0-191
> >> > > >    4 ttyS0                0-23,96-119
> >> > > > ...
> >> > > >   84 smartpqi             72-73,168
> >> > > > irq: page excluded: kernel virtual address: ffff97d03ffff000
> type: "irq_desc affinity"
> >> > > >
> >> > > > The reason is the reading of irq affinity exceeded the memory
> range, see
> >> > > > the following debug info:
> >> > > >
> >> > > > Thread 1 "crash" hit Breakpoint 1, generic_get_irq_affinity
> (irq=85) at kernel.c:7373
> >> > > > 7375          irq_desc_addr = get_irq_desc_addr(irq);
> >> > > > (gdb) p/x irq_desc_addr
> >> > > > $1 = 0xffff97d03f21e800
> >> > > >
> >> > > > crash> struct irq_desc 0xffff97d03f21e800
> >> > > > struct irq_desc {
> >> > > >    irq_common_data = {
> >> > > >      state_use_accessors = 425755136,
> >> > > >      node = 3,
> >> > > >      handler_data = 0x0,
> >> > > >      msi_desc = 0xffff97ca51b83480,
> >> > > >      affinity = 0xffff97d03fffee60,
> >> > > >      effective_affinity = 0xffff97d03fffe6c0
> >> > > >    },
> >> > > >
> >> > > > crash> whatis cpumask_t
> >> > > > typedef struct cpumask {
> >> > > >      unsigned long bits[128];
> >> > > > } cpumask_t;
> >> > > > SIZE: 1024
> >> > > >
> >> > > > In order to get the affinity, crash will read the memory range
> 0xffff97d03fffee60
> >> > > > ~ 0xffff97d03fffee60 + 1024(0x400) by line:
> >> > > >
> >> > > >       readmem(affinity_ptr, KVADDR, affinity, len,
> >> > > >               "irq_desc affinity", FAULT_ON_ERROR);
> >> > > >
> >> > > > However the reading will exceed the effective memory range:
> >> > > >
> >> > > > crash> kmem 0xffff97d03fffee60
> >> > > > CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
> >> > > > ffff97c900044400       32     123297    162944   1273     4k
> kmalloc-32
> >> > > >    SLAB              MEMORY            NODE  TOTAL  ALLOCATED
> FREE
> >> > > >    fffffca460ffff80  ffff97d03fffe000     3    128         81
> 47
> >> > > >    FREE / [ALLOCATED]
> >> > > >    [ffff97d03fffee60]
> >> > > >
> >> > > >        PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
> >> > > > fffffca460ffff80 83fffe000 dead000000000001 ffff97d03fffe340  1
> d7ffffe0000800 slab
> >> > > >
> >> > > > crash> kmem ffff97d03ffff000
> >> > > >        PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
> >> > > > fffffca460ffffc0 83ffff000                0        0  1
> d7ffffe0004000 reserved
> >> > > >
> >> > > > crash> dmesg
> >> > > > ...
> >> > > > [    0.000000] BIOS-e820: [mem
> 0x00000000fe000000-0x00000000fe00ffff] reserved
> >> > > > [    0.000000] BIOS-e820: [mem
> 0x0000000100000000-0x000000083fffefff] usable
> >> > > > [    0.000000] BIOS-e820: [mem
> 0x000000083ffff000-0x000000083fffffff] reserved
> >> > > > ...
> >> > > >
> >> > > > The beginning physical address, aka 0x83fffe000, is located in
> the usable
> >> > > > area and is readable, however the later physical address,
> starting from
> >> > > > 0x83ffff000, is located in reserved region and not readable. In
> fact,
> >> > > > the affinity member is allocated by alloc_cpumask_var_node(), for
> the 192 CPUs
> >> > > > system, the allocated size is only 24, and we can see it is within
> >> > > > the kmalloc-32 slab. So it is incorrect to read 1024 length(given
> by
> >> > > > STRUCT_SIZE("cpumask_t")), only 24 is enough.
> >> > > >
> >> > > > Since there are plenty of places in crash which takes the value of
> >> > > > STRUCT_SIZE("cpumask_t"), and works fine for the past, this patch
> will
> >> > > > not modify them all, but only this place which encountered the
> issue.
> >> > > >
> >> > > > Signed-off-by: Tao Liu<l...@redhat.com>
> >> > > > ---
> >> > > >   kernel.c | 9 ++++++---
> >> > > >   1 file changed, 6 insertions(+), 3 deletions(-)
> >> > > >
> >> > > > diff --git a/kernel.c b/kernel.c
> >> > > > index 8a9d498..464e877 100644
> >> > > > --- a/kernel.c
> >> > > > +++ b/kernel.c
> >> > > > @@ -7362,7 +7362,7 @@ void
> >> > > >   generic_get_irq_affinity(int irq)
> >> > > >   {
> >> > > >       ulong irq_desc_addr;
> >> > > > -     long len;
> >> > > > +     long len, len_cpumask;
> >> > > >       ulong affinity_ptr;
> >> > > >       ulong *affinity;
> >> > > >       ulong tmp_addr;
> >> > > > @@ -7382,8 +7382,11 @@ generic_get_irq_affinity(int irq)
> >> > > >       if (!action)
> >> > > >               return;
> >> > > >
> >> > > > -     if ((len = STRUCT_SIZE("cpumask_t")) < 0)
> >> > > > -             len = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) *
> sizeof(ulong);
> >> > > > +     len = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) * sizeof(ulong);
> >> > > > +     len_cpumask = STRUCT_SIZE("cpumask_t");
> >> > > > +     if (len_cpumask > 0) {
> >> > > > +             len = len_cpumask > len ? len : len_cpumask;
> >> > > > +     }
> >> > > >
> >> > >
> >> > > This change looks good, but I still have two comments below:
> >> > >
> >> > > [1] Can we drop the evaluation of "STRUCT_SIZE("cpumask_t")" and
> just
> >> > > use the size of "DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) *
> sizeof(ulong)"
> >> > > ? Are there any regression issues?
> >>
> >> I made a regression test, if all STRUCT_SIZE("cpumask_t") are replaced
> >> by DIV_ROUND_UP(...), there are regression issues found, but I didn't
> >> dive into the root cause of the failing reason.
> >>
> >
> > If so, let's keep it.
> >
> > Anyway, I'm still curious about the regression issue.
> >
> I re-read your comments, and I think I have misunderstood your request.
>
> The regressions are found when all places of STRUCT_SIZE("cpumask_t")
> in crash were replaced by DIV_ROUND_UP(...). You see there are plenty
> of places in crash which used STRUCT_SIZE("cpumask_t"), including the
>

Not needed.

Maybe they can be replaced with the SIZE(cpumask_t) in your unwind stack
patchset. Anyway, that is another issue.


> 2(the one which this patch is dealing with, and the one in tools.c)
>
> However if only the 2 places (the one which this patch is dealing
> with, and the one in tools.c) are replaced, there is no regression
> being noticed.
>
>
Totally correct. That is my concern.


> So I'm going to deal with the only 2 places in v2.
>
>
Sounds good. Thank you, Tao.

Lianbo



> Thanks,
> Tao Liu
>
>
> >> >
> >> > I'm not sure about the change, I will run a regression against it.
> >> > >
> >> > > [2] There are the similar case in the get_cpumask_buf(), see
> tools.c,
> >> > > can you make the same change?
> >>
> >> Yes, unlike [1], with only the similar case modified, no regressions
> >> found. I will post v2 to include it.
> >
> >
> > Ok, sounds good. Thank you for the regression test.
> >
> > Lianbo
> >
> >>
> >>
> >> Thanks,
> >> Tao Liu
> >>
> >> >
> >> > Yes, I will give it a try to see if regressions are found.
> >> >
> >> > Thanks,
> >> > Tao Liu
> >> > >
> >> > > ulong *
> >> > > get_cpumask_buf(void)
> >> > > {
> >> > >          int cpulen;
> >> > >          if ((cpulen = STRUCT_SIZE("cpumask_t")) < 0)
> >> > >                  cpulen = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) *
> >> > > sizeof(ulong);
> >> > >          return (ulong *)GETBUF(cpulen);
> >> > > }
> >> > >
> >> > > Any thoughts?
> >> > >
> >> > >
> >> > > Thanks
> >> > >
> >> > > Lianbo
> >> > >
> >> > >
> >> > > >       affinity = (ulong *)GETBUF(len);
> >> > > >       if (VALID_MEMBER(irq_common_data_affinity))
> >> > > > -- 2.40.1
> >> > >
> >>
>
>
--
Crash-utility mailing list -- devel@lists.crash-utility.osci.io
To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

Reply via email to