On 26.06.25 16:19, Jocelyn Falempe wrote: > On 25/06/2025 00:18, Jocelyn Falempe wrote: >> On 24/06/2025 20:55, Andrei Lalaev wrote: >>> On 18.04.25 18:48, Jocelyn Falempe wrote: >>>> On 32bits ARM, u64/u64 is not supported [1], so change the algorithm >>>> to use a simple fifo with decimal digits as u8 instead. >>>> This is slower but should compile on all architecture. >>>> >>>> Link: https://lore.kernel.org/dri-devel/ >>>> caniq72ke45eowckmhwhvmwxc03dxr4rnxxkvx+hvwdblopz...@mail.gmail.com/ [1] >>>> Signed-off-by: Jocelyn Falempe <jfale...@redhat.com> >>>> --- >>>> drivers/gpu/drm/drm_panic_qr.rs | 71 ++++++++++++++++++++++----------- >>>> 1 file changed, 48 insertions(+), 23 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/ >>>> drm_panic_qr.rs >>>> index 6025a705530e..dd55b1cb764d 100644 >>>> --- a/drivers/gpu/drm/drm_panic_qr.rs >>>> +++ b/drivers/gpu/drm/drm_panic_qr.rs >>>> @@ -366,8 +366,48 @@ fn iter(&self) -> SegmentIterator<'_> { >>>> SegmentIterator { >>>> segment: self, >>>> offset: 0, >>>> - carry: 0, >>>> - carry_len: 0, >>>> + decfifo: Default::default(), >>>> + } >>>> + } >>>> +} >>>> + >>>> +/// Max fifo size is 17 (max push) + 2 (max remaining) >>>> +const MAX_FIFO_SIZE: usize = 19; >>>> + >>>> +/// A simple Decimal digit FIFO >>>> +#[derive(Default)] >>>> +struct DecFifo { >>>> + decimals: [u8; MAX_FIFO_SIZE], >>>> + len: usize, >>>> +} >>>> + >>>> +impl DecFifo { >>>> + fn push(&mut self, data: u64, len: usize) { >>>> + let mut chunk = data; >>>> + for i in (0..self.len).rev() { >>>> + self.decimals[i + len] = self.decimals[i]; >>>> + } >>>> + for i in 0..len { >>>> + self.decimals[i] = (chunk % 10) as u8; >>>> + chunk /= 10; >>>> + } >>>> + self.len += len; >>>> + } >>>> + >>>> + /// Pop 3 decimal digits from the FIFO >>>> + fn pop3(&mut self) -> Option<(u16, usize)> { >>>> + if self.len == 0 { >>>> + None >>>> + } else { >>>> + let poplen = 3.min(self.len); >>>> + self.len -= poplen; >>>> + let mut out = 0; >>>> + let mut exp = 1; >>>> + for i in 0..poplen { >>>> + out += self.decimals[self.len + i] as u16 * exp; >>>> + exp *= 10; >>>> + } >>>> + Some((out, NUM_CHARS_BITS[poplen])) >>>> } >>>> } >>>> } >>>> @@ -375,8 +415,7 @@ fn iter(&self) -> SegmentIterator<'_> { >>>> struct SegmentIterator<'a> { >>>> segment: &'a Segment<'a>, >>>> offset: usize, >>>> - carry: u64, >>>> - carry_len: usize, >>>> + decfifo: DecFifo, >>>> } >>>> impl Iterator for SegmentIterator<'_> { >>>> @@ -394,31 +433,17 @@ fn next(&mut self) -> Option<Self::Item> { >>>> } >>>> } >>>> Segment::Numeric(data) => { >>>> - if self.carry_len < 3 && self.offset < data.len() { >>>> - // If there are less than 3 decimal digits in the >>>> carry, >>>> - // take the next 7 bytes of input, and add them to >>>> the carry. >>>> + if self.decfifo.len < 3 && self.offset < data.len() { >>>> + // If there are less than 3 decimal digits in the >>>> fifo, >>>> + // take the next 7 bytes of input, and push them to >>>> the fifo. >>>> let mut buf = [0u8; 8]; >>>> let len = 7.min(data.len() - self.offset); >>>> >>>> buf[..len].copy_from_slice(&data[self.offset..self.offset + len]); >>>> let chunk = u64::from_le_bytes(buf); >>>> - let pow = u64::pow(10, BYTES_TO_DIGITS[len] as u32); >>>> - self.carry = chunk + self.carry * pow; >>>> + self.decfifo.push(chunk, BYTES_TO_DIGITS[len]); >>>> self.offset += len; >>>> - self.carry_len += BYTES_TO_DIGITS[len]; >>>> - } >>>> - match self.carry_len { >>>> - 0 => None, >>>> - len => { >>>> - // take the next 3 decimal digits of the carry >>>> - // and return 10bits of numeric data. >>>> - let out_len = 3.min(len); >>>> - self.carry_len -= out_len; >>>> - let pow = u64::pow(10, self.carry_len as u32); >>>> - let out = (self.carry / pow) as u16; >>>> - self.carry %= pow; >>>> - Some((out, NUM_CHARS_BITS[out_len])) >>>> - } >>>> } >>>> + self.decfifo.pop3() >>>> } >>>> } >>>> } >>>> >>>> base-commit: 74757ad1c105c8fc00b4cac0b7918fe3262cdb18 >>> >>> Hi Jocelyn, >>> >>> Apologies for reviving this old thread, but I'm still encountering >>> the same issue with the latest master (78f4e737a53e). >>> >>> When compiling this module for ARM32 (multi_v7_defconfig), >>> I get the following error: >>> >>> ld.lld: error: undefined symbol: __aeabi_uldivmod >>> >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/ >>> drm_panic_qr.rs:392) >>> >>> drivers/gpu/drm/drm_panic_qr.o: >>> (<drm_panic_qr::SegmentIterator as >>> core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a >>> >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/ >>> drm_panic_qr.rs:392) >>> >>> drivers/gpu/drm/drm_panic_qr.o: >>> (<drm_panic_qr::SegmentIterator as >>> core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a >>> >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/ >>> drm_panic_qr.rs:392) >>> >>> drivers/gpu/drm/drm_panic_qr.o: >>> (<drm_panic_qr::SegmentIterator as >>> core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a >>> >>> referenced 14 more times >>> >>> did you mean: __aeabi_uidivmod >>> >>> defined in: vmlinux.a(arch/arm/lib/lib1funcs.o) >>> >>> Since no one else has reported this in two months, I’m wondering >>> if this might be a configuration issue on my end. >> >> Ok, that's surprising, the lines 391 and 392 are: >> >> self.decimals[i] = (chunk % 10) as u8; >> chunk /= 10; >> >> So the compiler should be smart enough to do that without using a division. >> I will try to reproduce, and see if I can fix that. > > I reproduced the issues, it looks like clang doesn't do the optimization on > ARM32: > > https://github.com/llvm/llvm-project/issues/37280 > > So I've made a quick test with the following changes, and it builds: > > > diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/drm_panic_qr.rs > index dd55b1cb764d..57bd3c6465bb 100644 > --- a/drivers/gpu/drm/drm_panic_qr.rs > +++ b/drivers/gpu/drm/drm_panic_qr.rs > @@ -381,6 +381,20 @@ struct DecFifo { > len: usize, > } > > +fn div10(val: u64) -> u64 > +{ > + let val_h = val >> 32; > + let val_l = val & 0xFFFFFFFF; > + let b_h: u64 = 0x66666666; > + let b_l: u64 = 0x66666667; > + > + let tmp1 = val_h * b_l + ((val_l * b_l) >> 32); > + let tmp2 = val_l * b_h + (tmp1 & 0xffffffff); > + let tmp3 = val_h * b_h + (tmp1 >> 32) + (tmp2 >> 32); > + > + tmp3 >> 2 > +} > + > impl DecFifo { > fn push(&mut self, data: u64, len: usize) { > let mut chunk = data; > @@ -389,7 +403,7 @@ fn push(&mut self, data: u64, len: usize) { > } > for i in 0..len { > self.decimals[i] = (chunk % 10) as u8; > - chunk /= 10; > + chunk = div10(chunk); > } > self.len += len; > } > > > Best regards, >
Compiles for me too on clang 20.1.6. Thanks a lot! -- Best regards, Andrei Lalaev