On Fri, 2 Jan 2026 00:25:54 +0100 Marek Vasut <[email protected]> wrote:
> On 1/2/26 12:13 AM, David Laight wrote: > > On Thu, 1 Jan 2026 22:26:34 +0100 > > Marek Vasut <[email protected]> wrote: > > > >> On 1/1/26 12:42 PM, David Laight wrote: > >> > >> Hello David, > >> > >>>> static void rcar_mipi_dsi_pll_calc(struct rcar_mipi_dsi *dsi, > >>>> unsigned long fin_rate, > >>>> unsigned long fout_target, > >>>> - struct dsi_setup_info *setup_info) > >>>> + struct dsi_setup_info *setup_info, > >>>> + u16 vclk_divider) > >>> > >>> There is no need for this to be u16, unsigned int will generate better > >>> code. > >> > >> Can you please elaborate on the "better code" part ? > > > > Basically the compiler has to generate extra code to ensure the value > > doesn't exceed 65535. > > So there will be a mask of the function parameter (passes in a 32bit > > register). > > Any arithmetic needs similar masking of the result. > > You won't see the latter (as much) on x86 (or m68k) because the compiler > > can use the C 'as if' rule and use the cpu's 8/16 bit registers and alu. > > But pretty much no other cpu can do that, so extra instructions are needed > > to stop the value (in a 32bit register) exceeding the limit for the > > variable. > > > > Remember that while C variables can be char/short the values they contain > > are promoted to 'signed int' before (pretty much) anything is done with > > the value, any calculated value is then truncated before being assigned > > back. > > For on-stack values this is fine - but modern compilers was to keep values > > in registers as much as possible. > > > >> > >>>> { > >>>> unsigned int best_err = -1; > >>>> const struct rcar_mipi_dsi_device_info *info = dsi->info; > >>>> @@ -360,7 +373,7 @@ static void rcar_mipi_dsi_pll_calc(struct > >>>> rcar_mipi_dsi *dsi, > >>>> if (fout < info->fout_min || fout > > >>>> info->fout_max) > >>>> continue; > >>>> > >>>> - fout = div64_u64(fout, > >>>> setup_info->vclk_divider); > >>>> + fout = div64_u64(fout, vclk_divider); > >>> > >>> Since vclk_divider is BIT_U32(div [+ 1]) this is just a shift right. > >>> So pass the bit number instead. > >> > >> While I agree this is a shift in the end, it also makes the code harder > >> to understand. I can add the following change into V2, but I would like > >> input from Tomi/Laurent on whether we should really push the > >> optimization in that direction: > > > > The shift really is a lot faster. > > Even on 64bit x86 cpus it can be slow - 35-88 clocks prior to 'Cannon Lake'. > > AMD cpu get better with Zen3, and using a 64bit divide with 32bits values > > doesn't slow things down (it does on the Intel cpu). > > Don't even think about what happens on 32bit cpus. > > I don't know about other architectures. > > Just comment as 'x >> n' rather than 'x / (1 << n)' > > Please note that this code is built primarily for arm64 , so discussing > x86 or m68k optimizations doesn't make much sense here. ARM definitely only has 32 and 64bit maths - so you will see masking instructions for arithmetic on char/short values (unless the compiler can tell that the values cannot get too large.) Divide performance is cpu dependant - the only arm cpu I've used only had software divide (but a fast barrel shifter). If you think that Intel haven't thrown much silicon at integer divide it isn't that likely that arm will have a divide that is much faster than 1 clock for each bit in the quotient. (Of course I might be wrong...) David
