On Mon, Sep 08, 2025 at 04:33:24PM +0530, Naveen N Rao wrote:
> On Wed, Sep 03, 2025 at 10:37:39PM -0400, Joe Lawrence wrote:
> > On Wed, Sep 03, 2025 at 10:29:50PM -0400, Joe Lawrence wrote:
> > > The powerpc64 module .stubs section holds ppc64_stub_entry[] code
> > > trampolines that are generated at module load time. These stubs are
> > > necessary for function calls to external symbols that are too far away
> > > for a simple relative branch.
> > >
> > > Logic for finding an available ppc64_stub_entry has relied on a sentinel
> > > value in the funcdata member to indicate a used slot. Code iterates
> > > through the array, inspecting .funcdata to find the first unused (zeroed)
> > > entry:
> > >
> > >   for (i = 0; stub_func_addr(stubs[i].funcdata); i++)
> > >
> > > To support CONFIG_PPC_FTRACE_OUT_OF_LINE, a new setup_ftrace_ool_stubs()
> > > function extended the .stubs section by adding an array of
> > > ftrace_ool_stub structures for each patchable function. A side effect
> > > of writing these smaller structures is that the funcdata sentinel
> > > convention is not maintained.
>
> There is clearly a bug in how we are reserving the stubs as you point
> out further below, but once that is properly initialized, I don't think
> the smaller structure size for ftrace_ool_stub matters (in so far as
> stub->funcdata being non-NULL). We end up writing four valid powerpc
> instructions, along with a ftrace_ops pointer before those instructions
> which should also be non-zero (there is a bug here too, more on that
> below).  The whole function descriptor dance does complicate matters a
> bit though.
>

Hi Naveen,

Ah hah, I see now, given the other bug that you mention, we should have
had seen non-NULL entries in either ftrace_ool_stub.insn[] or .ftrace_op
fields such that when read as ppc64_stub_entry, .funcdata would indicate
that it's in use:

        ppc64_stub_entry[]  ftrace_ool_stub[]
  0x00  [0].jump[0]         [0].ftrace_op
  0x04  [0].jump[1]         [0].ftrace_op
  0x08  [0].jump[2]         [0].insn[0]
  0x0C  [0].jump[3]         [0].insn[1]
  0x10  [0].jump[4]         [0].insn[2]
  0x14  [0].jump[5]         [0].insn[3]
  0x18  [0].jump[6]         [1].ftrace_op
  0x1C  [0].magic           [1].ftrace_op
  0x20  [0].funcdata        [1].insn[0]    <<
  0x24  [0].funcdata        [1].insn[1]    <<
  0x28  [1].jump[0]         [1].insn[2]
  0x2C  [1].jump[1]         [1].insn[3]
  0x30  [1].jump[2]         [2].ftrace_op
  0x34  [1].jump[3]         [2].ftrace_op
  0x38  [1].jump[4]         [2].insn[0]
  0x3C  [1].jump[5]         [2].insn[1]
  0x40  [1].jump[6]         [2].insn[2]
  0x44  [1].magic           [2].insn[3]
  0x48  [1].funcdata        [3].ftrace_op  <<
  0x4C  [1].funcdata        [3].ftrace_op  <<

If the commit msg for this patch would be clearer by rewording anything,
I'm happy to update.  (My understanding at the time of writing was that
the NULL funcdata vs. insn[]/ftrace_op was a valid sequence.)

> > > This is not a problem for an ordinary
> > > kernel module, as this occurs in module_finalize(), after which no
> > > further .stubs updates are needed.
> > >
> > > However, when loading a livepatch module that contains klp-relocations,
> > > apply_relocate_add() is executed a second time, after the out-of-line
> > > ftrace stubs have been set up.
> > >
> > > When apply_relocate_add() then calls stub_for_addr() to handle a
> > > klp-relocation, its search for the next available ppc64_stub_entry[]
> > > slot may stop prematurely in the middle of the ftrace_ool_stub array. A
> > > full ppc64_stub_entry is then written, corrupting the ftrace stubs.
> > >
> > > Fix this by explicitly tracking the count of used ppc64_stub_entrys.
> > > Rather than relying on an inline funcdata sentinel value, a new
> > > stub_count is used as the explicit boundary for searching and allocating
> > > stubs. This simplifies the code, eliminates the "one extra reloc" that
> > > was required for the sentinel check, and resolves the memory corruption.
> > >
> >
> > Apologies if this is too wordy, I wrote it as a bit of a braindump to
> > summarize the longer analysis at the bottom of the reply ...
>
> This was a good read, thanks for all the details. It did help spot
> another issue.
>
> >
> > > Fixes: eec37961a56a ("powerpc64/ftrace: Move ftrace sequence out of line")
> > > Signed-off-by: Joe Lawrence <joe.lawre...@redhat.com>
> > > ---
> > >  arch/powerpc/include/asm/module.h |  1 +
> > >  arch/powerpc/kernel/module_64.c   | 26 ++++++++------------------
> > >  2 files changed, 9 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/arch/powerpc/include/asm/module.h 
> > > b/arch/powerpc/include/asm/module.h
> > > index e1ee5026ac4a..864e22deaa2c 100644
> > > --- a/arch/powerpc/include/asm/module.h
> > > +++ b/arch/powerpc/include/asm/module.h
> > > @@ -27,6 +27,7 @@ struct ppc_plt_entry {
> > >  struct mod_arch_specific {
> > >  #ifdef __powerpc64__
> > >   unsigned int stubs_section;     /* Index of stubs section in module */
> > > + unsigned int stub_count;        /* Number of stubs used */
> > >  #ifdef CONFIG_PPC_KERNEL_PCREL
> > >   unsigned int got_section;       /* What section is the GOT? */
> > >   unsigned int pcpu_section;      /* .data..percpu section */
> > > diff --git a/arch/powerpc/kernel/module_64.c 
> > > b/arch/powerpc/kernel/module_64.c
> > > index 126bf3b06ab7..2a44bc8e2439 100644
> > > --- a/arch/powerpc/kernel/module_64.c
> > > +++ b/arch/powerpc/kernel/module_64.c
> > > @@ -209,8 +209,7 @@ static unsigned long get_stubs_size(const Elf64_Ehdr 
> > > *hdr,
> > >                               char *secstrings,
> > >                               struct module *me)
> > >  {
> > > - /* One extra reloc so it's always 0-addr terminated */
> > > - unsigned long relocs = 1;
> > > + unsigned long relocs = 0;
> > >   unsigned i;
> > >
> > >   /* Every relocated section... */
> > > @@ -705,7 +704,7 @@ static unsigned long stub_for_addr(const Elf64_Shdr 
> > > *sechdrs,
> > >
> > >   /* Find this stub, or if that fails, the next avail. entry */
> > >   stubs = (void *)sechdrs[me->arch.stubs_section].sh_addr;
> > > - for (i = 0; stub_func_addr(stubs[i].funcdata); i++) {
> > > + for (i = 0; i < me->arch.stub_count; i++) {
> > >           if (WARN_ON(i >= num_stubs))
> > >                   return 0;
> > >
> > > @@ -716,6 +715,7 @@ static unsigned long stub_for_addr(const Elf64_Shdr 
> > > *sechdrs,
> > >   if (!create_stub(sechdrs, &stubs[i], addr, me, name))
> > >           return 0;
> > >
> > > + me->arch.stub_count++;
> > >   return (unsigned long)&stubs[i];
> > >  }
> > >
> > > @@ -1118,29 +1118,19 @@ int module_trampoline_target(struct module *mod, 
> > > unsigned long addr,
> > >  static int setup_ftrace_ool_stubs(const Elf64_Shdr *sechdrs, unsigned 
> > > long addr, struct module *me)
> > >  {
> > >  #ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
> > > - unsigned int i, total_stubs, num_stubs;
> > > + unsigned int total_stubs, num_stubs;
> > >   struct ppc64_stub_entry *stub;
> > >
> > >   total_stubs = sechdrs[me->arch.stubs_section].sh_size / sizeof(*stub);
> > >   num_stubs = roundup(me->arch.ool_stub_count * sizeof(struct 
> > > ftrace_ool_stub),
> > >                       sizeof(struct ppc64_stub_entry)) / sizeof(struct 
> > > ppc64_stub_entry);
> > >
> > > - /* Find the next available entry */
> > > - stub = (void *)sechdrs[me->arch.stubs_section].sh_addr;
> > > - for (i = 0; stub_func_addr(stub[i].funcdata); i++)
> > > -         if (WARN_ON(i >= total_stubs))
> > > -                 return -1;
> > > -
> > > - if (WARN_ON(i + num_stubs > total_stubs))
> > > + if (WARN_ON(me->arch.stub_count + num_stubs > total_stubs))
> > >           return -1;
> > >
> > > - stub += i;
> > > - me->arch.ool_stubs = (struct ftrace_ool_stub *)stub;
> > > -
> > > - /* reserve stubs */
> > > - for (i = 0; i < num_stubs; i++)
> > > -         if (patch_u32((void *)&stub->funcdata, PPC_RAW_NOP()))
> > > -                 return -1;
> >
> > At first I thought the bug was that this loop was re-writting the same
> > PPC_RAW_NOP() to the same funcdata (i.e, should have been something
> > like: patch_u32((void *)stub[i]->funcdata, PPC_RAW_NOP())), but that
> > didn't work and seemed fragile anyway.
>
> D'uh - this path was clearly never tested. I suppose this should have
> been something like this:
>       patch_ulong((void *)&stub[i]->funcdata, func_desc(local_paca))
>
> Still convoluted, but I think that should hopefully fix the problem you
> are seeing.
>

I can still try something like this if you prefer that solution (though
I think there may be some type massaging required in the second argument
to patch_ulong().)  LMK ...

> >
> > > + stub = (void *)sechdrs[me->arch.stubs_section].sh_addr;
> > > + me->arch.ool_stubs = (struct ftrace_ool_stub *)(stub + 
> > > me->arch.stub_count);
> > > + me->arch.stub_count += num_stubs;
> > >  #endif
>
> Regardless of the above, your proposed change looks good to me and
> simplifies the logic. So:
> Acked-by: Naveen N Rao (AMD) <nav...@kernel.org>
>



> >   crash> dis 0xc008000007d70dd0 42
> >   ppc64[ ]   ftrace[0]    <xfs_stats_format+0x558>:    .long 0x0
> >                           <xfs_stats_format+0x55c>:    .long 0x0
> >                           <xfs_stats_format+0x560>:    mflr    r0
> >                           <xfs_stats_format+0x564>:    bl      
> > 0xc008000007d70d80 <xfs_stats_format+0x508>
> >                           <xfs_stats_format+0x568>:    mtlr    r0
> >                           <xfs_stats_format+0x56c>:    b       
> > 0xc008000007d70014 <patch_free_livepatch+0xc>
> >              ftrace[1]    <xfs_stats_format+0x570>:    .long 0x0
> >                           <xfs_stats_format+0x574>:    .long 0x0
> >                           <xfs_stats_format+0x578>:    mflr    r0
> >                           <xfs_stats_format+0x57c>:    bl      
> > 0xc008000007d70d80 <xfs_stats_format+0x508>
> >   ppc64[ ]                <xfs_stats_format+0x580>:    addis   r11,r2,4     
> >                                     << This looks like a full
> >                           <xfs_stats_format+0x584>:    addi    
> > r11,r11,-29448                                   << ppc64_stub_entry
> >              ftrace[2]    <xfs_stats_format+0x588>:    std     r2,24(r1)    
> >                                     << dropped in the middle
> >                           <xfs_stats_format+0x58c>:    ld      r12,32(r11)  
> >                                     << of the ool_stubs array
> >                           <xfs_stats_format+0x590>:    mtctr   r12          
> >                                     << of ftrace_ool_stub[]
> >                           <xfs_stats_format+0x594>:    bctr                 
> >                                     <<
> >                           <xfs_stats_format+0x598>:    mtlr    r0           
> >                                     <<
> >                           <xfs_stats_format+0x59c>:    andi.   
> > r20,r27,30050                                    <<
> >              ftrace[3]    <xfs_stats_format+0x5a0>:    .long 0x54e92b8      
> >                                     <<
> >                           <xfs_stats_format+0x5a4>:    lfs     f0,0(r8)     
> >                                     <<
> >   ppc64[ ]                <xfs_stats_format+0x5a8>:    mflr    r0
> >                           <xfs_stats_format+0x5ac>:    bl      
> > 0xc008000007d70d80 <xfs_stats_format+0x508>
> >                           <xfs_stats_format+0x5b0>:    mtlr    r0
> >                           <xfs_stats_format+0x5b4>:    b       
> > 0xc008000007d7037c <add_callbacks_to_patch_objects+0xc>
> >              ftrace[4]    <xfs_stats_format+0x5b8>:    .long 0x0
> >                           <xfs_stats_format+0x5bc>:    .long 0x0
>
> These NULL values are also problematic. I think those are NULL since we
> are not "reserving" the stubs properly, but those should point to some
> ftrace_op. I think we are missing a call to ftrace_rec_set_nop_ops() in
> ftrace_init_nop(), which would be good to do separately.
>

Very lightly tested, but were you thinking of something like:

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 6dca92d5a..687371c64 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -488,8 +488,12 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace 
*rec)
                return ret;

        /* Set up out-of-line stub */
-       if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE))
-               return ftrace_init_ool_stub(mod, rec);
+       if (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE)) {
+               ret = ftrace_init_ool_stub(mod, rec);
+               if (ret)
+                       return ret;
+               return ftrace_rec_set_nop_ops(rec);
+       }

        /* Nop-out the ftrace location */
        new = ppc_inst(PPC_RAW_NOP());
@@ -520,7 +524,7 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace 
*rec)
                return -EINVAL;
        }

-       return ret;
+       return ftrace_rec_set_nop_ops(rec);
 }

 int ftrace_update_ftrace_func(ftrace_func_t func)


In which case the ftrace-ool area looks like:

  crash> mod | grep livepatch_module
  c008000006350500  livepatch_module                   c008000009b90000   
262144  (not loaded)  [CONFIG_KALLSYMS]
  crash> struct module.arch.ool_stubs c008000006350500
    arch.ool_stubs = 0xc008000009b90dd0 <xfs_stats_format+1368>,
  crash> struct module.arch.ool_stub_count c008000006350500
    arch.ool_stub_count = 7,

  crash> struct ftrace_ool_stub 0xc008000009b90dd0 7
  struct ftrace_ool_stub {
    ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
    insn = {0x7c0802a6, 0x4bffffa5, 0x7c0803a6, 0x4bfff230}
  }

  struct ftrace_ool_stub {
    ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
    insn = {0x7c0802a6, 0x4bffff8d, 0x7c0803a6, 0x4bfff304}
  }

  struct ftrace_ool_stub {
    ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
    insn = {0x7c0802a6, 0x4bffff75, 0x7c0803a6, 0x4bfff430}
  }

  struct ftrace_ool_stub {
    ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
    insn = {0x7c0802a6, 0x4bffff5d, 0x7c0803a6, 0x4bfff550}
  }

  struct ftrace_ool_stub {
    ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
    insn = {0x7c0802a6, 0x4bffff45, 0x7c0803a6, 0x4bfff768}
  }

  struct ftrace_ool_stub {
    ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
    insn = {0x7c0802a6, 0x4bffff2d, 0x7c0803a6, 0x4bfffa08}
  }

  struct ftrace_ool_stub {
    ftrace_op = 0xc00000000131d140 <ftrace_nop_ops>,
    insn = {0x7c0802a6, 0x4bffff15, 0x7c0803a6, 0x4bfffa10}
  }


--
Joe


Reply via email to