On Fri, Jun 27, 2025 at 11:41 AM Jocelyn Falempe <jfale...@redhat.com> wrote: > > On 32bits ARM, u64 divided by a constant is not optimized to a > multiply by inverse by the compiler [1]. > So do the multiply by inverse explicitly for this architecture. > > Link: https://github.com/llvm/llvm-project/issues/37280 [1] > Reported-by: Andrei Lalaev <andrey.lal...@gmail.com> > Closes: > https://lore.kernel.org/dri-devel/c0a2771c-f3f5-4d4c-aa82-d673b3c5c...@gmail.com/ > Fixes: 675008f196ca ("drm/panic: Use a decimal fifo to avoid u64 by u64 > divide") > Signed-off-by: Jocelyn Falempe <jfale...@redhat.com>
Not to block this change, but I think this really ought to be fixed in the compiler. We should not have to do this kind of thing to divide by 10. > drivers/gpu/drm/drm_panic_qr.rs | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/drm_panic_qr.rs > index dd55b1cb764d..82acecd505d3 100644 > --- a/drivers/gpu/drm/drm_panic_qr.rs > +++ b/drivers/gpu/drm/drm_panic_qr.rs > @@ -381,6 +381,24 @@ struct DecFifo { > len: usize, > } > > +/// On arm32 architecture, dividing an u64 by a constant will generate a call > +/// to __aeabi_uldivmod which is not present in the kernel. > +/// So use the multiply by inverse method for this architecture. > +#[cfg(target_arch = "arm")] > +fn div10(val: u64) -> u64 > +{ Please run rustfmt on your patch. > + let val_h = val >> 32; > + let val_l = val & 0xFFFFFFFF; > + let b_h: u64 = 0x66666666; > + let b_l: u64 = 0x66666667; > + > + let tmp1 = val_h * b_l + ((val_l * b_l) >> 32); > + let tmp2 = val_l * b_h + (tmp1 & 0xffffffff); > + let tmp3 = val_h * b_h + (tmp1 >> 32) + (tmp2 >> 32); > + > + tmp3 >> 2 > +} > + > impl DecFifo { > fn push(&mut self, data: u64, len: usize) { > let mut chunk = data; > @@ -389,7 +407,11 @@ fn push(&mut self, data: u64, len: usize) { > } > for i in 0..len { > self.decimals[i] = (chunk % 10) as u8; > - chunk /= 10; > + if cfg!(target_arch = "arm") { > + chunk = div10(chunk); > + } else { > + chunk /= 10; > + } I would get rid of this conditional and declare another div10 function that just does input/10 on other arches. Alice