On Wed, Jul 17, 2024 at 12:01 PM Tao Liu <l...@redhat.com> wrote: > Hi Lianbo, > > On Wed, Jul 17, 2024 at 2:24 PM lijiang <liji...@redhat.com> wrote: > > > > On Wed, Jul 17, 2024 at 6:01 AM Tao Liu <l...@redhat.com> wrote: > >> > >> Hi Lianbo, > >> > >> On Tue, Jul 16, 2024 at 11:19 AM Tao Liu <l...@redhat.com> wrote: > >> > > >> > Hi Lianbo, > >> > > >> > On Fri, Jul 12, 2024 at 6:32 PM Lianbo Jiang <liji...@redhat.com> > wrote: > >> > > > >> > > Hi, Tao > >> > > > >> > > On 7/5/24 9:26 AM, devel-requ...@lists.crash-utility.osci.io wrote: > >> > > > Date: Thu, 4 Jul 2024 17:00:56 +1200 > >> > > > From: Tao Liu<l...@redhat.com> > >> > > > Subject: [Crash-utility] [PATCH] Fix "irq -a" exceeding the memory > >> > > > range issue > >> > > > To:devel@lists.crash-utility.osci.io > >> > > > Cc: Tao Liu<l...@redhat.com> > >> > > > Message-ID:<20240704050056.17375-1-l...@redhat.com> > >> > > > Content-Type: text/plain; charset="US-ASCII"; x-default=true > >> > > > > >> > > > Previously without the patch, there was an error observed as > follows: > >> > > > > >> > > > crash> irq -a > >> > > > IRQ NAME AFFINITY > >> > > > 0 timer 0-191 > >> > > > 4 ttyS0 0-23,96-119 > >> > > > ... > >> > > > 84 smartpqi 72-73,168 > >> > > > irq: page excluded: kernel virtual address: ffff97d03ffff000 > type: "irq_desc affinity" > >> > > > > >> > > > The reason is the reading of irq affinity exceeded the memory > range, see > >> > > > the following debug info: > >> > > > > >> > > > Thread 1 "crash" hit Breakpoint 1, generic_get_irq_affinity > (irq=85) at kernel.c:7373 > >> > > > 7375 irq_desc_addr = get_irq_desc_addr(irq); > >> > > > (gdb) p/x irq_desc_addr > >> > > > $1 = 0xffff97d03f21e800 > >> > > > > >> > > > crash> struct irq_desc 0xffff97d03f21e800 > >> > > > struct irq_desc { > >> > > > irq_common_data = { > >> > > > state_use_accessors = 425755136, > >> > > > node = 3, > >> > > > handler_data = 0x0, > >> > > > msi_desc = 0xffff97ca51b83480, > >> > > > affinity = 0xffff97d03fffee60, > >> > > > effective_affinity = 0xffff97d03fffe6c0 > >> > > > }, > >> > > > > >> > > > crash> whatis cpumask_t > >> > > > typedef struct cpumask { > >> > > > unsigned long bits[128]; > >> > > > } cpumask_t; > >> > > > SIZE: 1024 > >> > > > > >> > > > In order to get the affinity, crash will read the memory range > 0xffff97d03fffee60 > >> > > > ~ 0xffff97d03fffee60 + 1024(0x400) by line: > >> > > > > >> > > > readmem(affinity_ptr, KVADDR, affinity, len, > >> > > > "irq_desc affinity", FAULT_ON_ERROR); > >> > > > > >> > > > However the reading will exceed the effective memory range: > >> > > > > >> > > > crash> kmem 0xffff97d03fffee60 > >> > > > CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME > >> > > > ffff97c900044400 32 123297 162944 1273 4k > kmalloc-32 > >> > > > SLAB MEMORY NODE TOTAL ALLOCATED > FREE > >> > > > fffffca460ffff80 ffff97d03fffe000 3 128 81 > 47 > >> > > > FREE / [ALLOCATED] > >> > > > [ffff97d03fffee60] > >> > > > > >> > > > PAGE PHYSICAL MAPPING INDEX CNT FLAGS > >> > > > fffffca460ffff80 83fffe000 dead000000000001 ffff97d03fffe340 1 > d7ffffe0000800 slab > >> > > > > >> > > > crash> kmem ffff97d03ffff000 > >> > > > PAGE PHYSICAL MAPPING INDEX CNT FLAGS > >> > > > fffffca460ffffc0 83ffff000 0 0 1 > d7ffffe0004000 reserved > >> > > > > >> > > > crash> dmesg > >> > > > ... > >> > > > [ 0.000000] BIOS-e820: [mem > 0x00000000fe000000-0x00000000fe00ffff] reserved > >> > > > [ 0.000000] BIOS-e820: [mem > 0x0000000100000000-0x000000083fffefff] usable > >> > > > [ 0.000000] BIOS-e820: [mem > 0x000000083ffff000-0x000000083fffffff] reserved > >> > > > ... > >> > > > > >> > > > The beginning physical address, aka 0x83fffe000, is located in > the usable > >> > > > area and is readable, however the later physical address, > starting from > >> > > > 0x83ffff000, is located in reserved region and not readable. In > fact, > >> > > > the affinity member is allocated by alloc_cpumask_var_node(), for > the 192 CPUs > >> > > > system, the allocated size is only 24, and we can see it is within > >> > > > the kmalloc-32 slab. So it is incorrect to read 1024 length(given > by > >> > > > STRUCT_SIZE("cpumask_t")), only 24 is enough. > >> > > > > >> > > > Since there are plenty of places in crash which takes the value of > >> > > > STRUCT_SIZE("cpumask_t"), and works fine for the past, this patch > will > >> > > > not modify them all, but only this place which encountered the > issue. > >> > > > > >> > > > Signed-off-by: Tao Liu<l...@redhat.com> > >> > > > --- > >> > > > kernel.c | 9 ++++++--- > >> > > > 1 file changed, 6 insertions(+), 3 deletions(-) > >> > > > > >> > > > diff --git a/kernel.c b/kernel.c > >> > > > index 8a9d498..464e877 100644 > >> > > > --- a/kernel.c > >> > > > +++ b/kernel.c > >> > > > @@ -7362,7 +7362,7 @@ void > >> > > > generic_get_irq_affinity(int irq) > >> > > > { > >> > > > ulong irq_desc_addr; > >> > > > - long len; > >> > > > + long len, len_cpumask; > >> > > > ulong affinity_ptr; > >> > > > ulong *affinity; > >> > > > ulong tmp_addr; > >> > > > @@ -7382,8 +7382,11 @@ generic_get_irq_affinity(int irq) > >> > > > if (!action) > >> > > > return; > >> > > > > >> > > > - if ((len = STRUCT_SIZE("cpumask_t")) < 0) > >> > > > - len = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) * > sizeof(ulong); > >> > > > + len = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) * sizeof(ulong); > >> > > > + len_cpumask = STRUCT_SIZE("cpumask_t"); > >> > > > + if (len_cpumask > 0) { > >> > > > + len = len_cpumask > len ? len : len_cpumask; > >> > > > + } > >> > > > > >> > > > >> > > This change looks good, but I still have two comments below: > >> > > > >> > > [1] Can we drop the evaluation of "STRUCT_SIZE("cpumask_t")" and > just > >> > > use the size of "DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) * > sizeof(ulong)" > >> > > ? Are there any regression issues? > >> > >> I made a regression test, if all STRUCT_SIZE("cpumask_t") are replaced > >> by DIV_ROUND_UP(...), there are regression issues found, but I didn't > >> dive into the root cause of the failing reason. > >> > > > > If so, let's keep it. > > > > Anyway, I'm still curious about the regression issue. > > > I re-read your comments, and I think I have misunderstood your request. > > The regressions are found when all places of STRUCT_SIZE("cpumask_t") > in crash were replaced by DIV_ROUND_UP(...). You see there are plenty > of places in crash which used STRUCT_SIZE("cpumask_t"), including the >
Not needed. Maybe they can be replaced with the SIZE(cpumask_t) in your unwind stack patchset. Anyway, that is another issue. > 2(the one which this patch is dealing with, and the one in tools.c) > > However if only the 2 places (the one which this patch is dealing > with, and the one in tools.c) are replaced, there is no regression > being noticed. > > Totally correct. That is my concern. > So I'm going to deal with the only 2 places in v2. > > Sounds good. Thank you, Tao. Lianbo > Thanks, > Tao Liu > > > >> > > >> > I'm not sure about the change, I will run a regression against it. > >> > > > >> > > [2] There are the similar case in the get_cpumask_buf(), see > tools.c, > >> > > can you make the same change? > >> > >> Yes, unlike [1], with only the similar case modified, no regressions > >> found. I will post v2 to include it. > > > > > > Ok, sounds good. Thank you for the regression test. > > > > Lianbo > > > >> > >> > >> Thanks, > >> Tao Liu > >> > >> > > >> > Yes, I will give it a try to see if regressions are found. > >> > > >> > Thanks, > >> > Tao Liu > >> > > > >> > > ulong * > >> > > get_cpumask_buf(void) > >> > > { > >> > > int cpulen; > >> > > if ((cpulen = STRUCT_SIZE("cpumask_t")) < 0) > >> > > cpulen = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) * > >> > > sizeof(ulong); > >> > > return (ulong *)GETBUF(cpulen); > >> > > } > >> > > > >> > > Any thoughts? > >> > > > >> > > > >> > > Thanks > >> > > > >> > > Lianbo > >> > > > >> > > > >> > > > affinity = (ulong *)GETBUF(len); > >> > > > if (VALID_MEMBER(irq_common_data_affinity)) > >> > > > -- 2.40.1 > >> > > > >> > >
-- Crash-utility mailing list -- devel@lists.crash-utility.osci.io To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ Contribution Guidelines: https://github.com/crash-utility/crash/wiki