On Thu, Oct 3, 2024 at 10:41 AM Alexandre Ghiti <[email protected]> wrote: > > Hi Jason, > > On 02/10/2024 17:14, Jason Montleon wrote: > > On Tue, Oct 1, 2024 at 10:53 AM Alexandre Ghiti <[email protected]> wrote: > >> Hi Jason, > >> > >> On 25/09/2024 16:32, Jason Montleon wrote: > >>> On Tue, Sep 24, 2024 at 1:36 PM Kees Cook <[email protected]> wrote: > >>>> > >>>> On September 24, 2024 8:58:57 AM PDT, Jason Montleon > >>>> <[email protected]> wrote: > >>>>> On Sun, Sep 22, 2024 at 6:38 PM Kees Cook <[email protected]> wrote: > >>>>>> Can you try this patch? It should avoid using the "WARN" infrastructure > >>>>>> (if that is the source of blocking boot), but should still provide some > >>>>>> detail about what tripped it up (via the "regular" pr_*() logging). And > >>>>>> if it boots, can you look at the log to find what it reports for the > >>>>>> "Wanted to write ..." line(s)? > >>>>>> > >>>>>> Thanks! > >>>>>> > >>>>>> -Kees > >>>>>> > >>>>>> > >>>>>> diff --git a/include/linux/fortify-string.h > >>>>>> b/include/linux/fortify-string.h > >>>>>> index 0d99bf11d260..469a3439959d 100644 > >>>>>> --- a/include/linux/fortify-string.h > >>>>>> +++ b/include/linux/fortify-string.h > >>>>>> @@ -549,7 +549,8 @@ __FORTIFY_INLINE bool > >>>>>> fortify_memcpy_chk(__kernel_size_t size, > >>>>>> const size_t q_size, > >>>>>> const size_t p_size_field, > >>>>>> const size_t q_size_field, > >>>>>> - const u8 func) > >>>>>> + const u8 func, > >>>>>> + const char *field) > >>>>>> { > >>>>>> if (__builtin_constant_p(size)) { > >>>>>> /* > >>>>>> @@ -610,8 +611,13 @@ __FORTIFY_INLINE bool > >>>>>> fortify_memcpy_chk(__kernel_size_t size, > >>>>>> * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832 > >>>>>> */ > >>>>>> if (p_size_field != SIZE_MAX && > >>>>>> - p_size != p_size_field && p_size_field < size) > >>>>>> - return true; > >>>>>> + p_size != p_size_field && p_size_field < size) { > >>>>>> + if (p_size_field == 0) > >>>>>> + pr_warn("Wanted to write %zu to a 0-sized > >>>>>> destination: %s\n", > >>>>>> + size, field); > >>>>>> + else > >>>>>> + return true; > >>>>>> + } > >>>>>> > >>>>>> return false; > >>>>>> } > >>>>>> @@ -625,7 +631,8 @@ __FORTIFY_INLINE bool > >>>>>> fortify_memcpy_chk(__kernel_size_t size, > >>>>>> const size_t __q_size_field = (q_size_field); > >>>>>> \ > >>>>>> fortify_warn_once(fortify_memcpy_chk(__fortify_size, > >>>>>> __p_size, \ > >>>>>> __q_size, __p_size_field, > >>>>>> \ > >>>>>> - __q_size_field, FORTIFY_FUNC_ > >>>>>> ##op), \ > >>>>>> + __q_size_field, FORTIFY_FUNC_ > >>>>>> ##op,\ > >>>>>> + #p " at " FILE_LINE), \ > >>>>>> #op ": detected field-spanning write (size %zu) of > >>>>>> single %s (size %zu)\n", \ > >>>>>> __fortify_size, > >>>>>> \ > >>>>>> "field \"" #p "\" at " FILE_LINE, > >>>>>> \ > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Kees Cook > >>>>>> > >>>>> Hi Kees, > >>>>> Sorry for the delay in responding. Your patch also caused the system > >>>>> to hang booting, but I took it as inspiration to try some things. > >>>>> TL;DR I see these two print several times: > >>>>> > >>>>> Wanted to write 8 to a 0-sized destination: oldptr at > >>>>> arch/riscv/errata/thead/errata.c:185 > >>>>> Wanted to write 28 to a 0-sized destination: oldptr at > >>>>> arch/riscv/errata/thead/errata.c:185 > >>>>> > >>>>> Since I suspected CONFIG_ERRATA_THEAD_MAE which relies on > >>>>> CONFIG_RISCV_ALTERNATIVE_EARLY I suspected it is just too early for > >>>>> pr_*, etc. I tried trace_printk but that didn't produce any traces, > >>>>> and then I found a comment which makes me think we're too early for > >>>>> that too. > >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/kernel/alternative.c?h=v6.11#n214 > >>>>> > >>>>> I didn't want to give up so I wrote a simple module to figure out how > >>>>> to use sbi_ecall and then incorporated that idea into your patch, > >>>>> although even that was not straightforward: > >>>>> https://gist.github.com/jmontleon/f3536ad70dc92bed992139ad5b266d5c > >>>> Oh nice work! Probably something riscv-specific needs to be wired up to > >>>> the printk routines to do what you have there until printk is fully > >>>> available But that's a separate issue, I guess. > >>>> > >>>>> Full output: > >>>>> https://gist.github.com/jmontleon/823facff38e28b717f153e38adcfd7fe > >>>> Does the system boot with your patch? > >>> Yes. It boots fully. It also booted with trace_printk, it just didn't > >>> produce a trace, which again I think is expected at that stage. It > >>> seems as long as there are no 'normal' print attempts, pr_()*, etc. it > >>> works. > >> > >> Hmmm I don't see what in your patch would fix your issue. > >> > >> I sent a patch > >> (https://lore.kernel.org/linux-riscv/[email protected]/) > >> that was merged in v6.11-rc7 which fixes something similar, could that > >> be what fixed your problem? > >> > > Hi Alex, Thank you for having a look. > > > > Although I narrowed it down to starting with 6.11-rc1, the first time > > I encountered this was actually with an rc7 Fedora build. > > http://riscv.rocks/koji/buildinfo?buildID=338433 > > > > I also did a build from upstream 6.11.1 last night and still see the > > problem trying to boot with it this morning. > > > >> The problem with this early code is that the MMU is not enabled yet and > >> if your kernel is relocatable, the relocations are set "virtually" and > >> then without the MMU, a call to an external function results in a > >> trap...Which happens way too often, I have a few ideas to fix that, I > >> need to find time to tackle this though. > >> > >> I don't know FORTIFY_SOURCE, but I guess this could cause a call to an > >> external function and then cause the issue I describe above right? We > >> already remove all instrumentations from this early code so maybe we > >> should remove FORTIFY_SOURCE too? > > > So I was able to reproduce the issue on qemu by adding a few tweaks, and > indeed we trap in __warn_printk() on a virtual address but MMU is not > enabled yet. > > The following diff though allows me to pass this failure but I can't get > much further in the boot since the tweaks I added won't allow it, can > you give the following a try? > > diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile > index 8a27394851233..4913f3b3f198c 100644 > --- a/arch/riscv/errata/Makefile > +++ b/arch/riscv/errata/Makefile > @@ -2,6 +2,10 @@ ifdef CONFIG_RELOCATABLE > KBUILD_CFLAGS += -fno-pie > endif > > +ifdef CONFIG_RISCV_ALTERNATIVE_EARLY > +KBUILD_CFLAGS += -D__NO_FORTIFY > +endif > + > obj-$(CONFIG_ERRATA_ANDES) += andes/ > obj-$(CONFIG_ERRATA_SIFIVE) += sifive/ > obj-$(CONFIG_ERRATA_THEAD) += thead/
Yes, this worked. Thank you, Jason > Thanks, > > Alex > > > >> > > What you are describing here does not seem unreasonable to me, but I > > simply do not know enough to say one way or the other. > > Is there something I can do to collect more information to determine > > if this is the cause? > > > > Thank you again for having a look, > > Jason > > > >> Thanks for digging into this, > >> > >> Alex > >> > >> > >>>> I would guess that this line at arch/riscv/errata/thead/errata.c:168: > >>>> > >>>> void *oldptr, *altptr; > >>>> > >>>> Should be: > >>>> > >>>> u8 *oldptr, *altptr; > >>> I tried this, but it didn't appear to make a difference, I will try > >>> some more today, though I admit to being out of my element. > >>> > >>> > >>> > >>>> But maybe __ALT_PTR needs adjustment too? Hmm. I would have expected > >>>> void * to have a size of SIZE_MAX here, though, so something else might > >>>> be confusing the check. > >>>> > >>>> > >>>> -- > >>>> Kees Cook > >>>> > >>> _______________________________________________ > >>> linux-riscv mailing list > >>> [email protected] > >>> http://lists.infradead.org/mailman/listinfo/linux-riscv > > > > _______________________________________________ > > linux-riscv mailing list > > [email protected] > > http://lists.infradead.org/mailman/listinfo/linux-riscv >
