Hi, On 02/01/2026 01:44, David Laight wrote: > On Fri, 2 Jan 2026 00:25:54 +0100 > Marek Vasut <[email protected]> wrote: > >> On 1/2/26 12:13 AM, David Laight wrote: >>> On Thu, 1 Jan 2026 22:26:34 +0100 >>> Marek Vasut <[email protected]> wrote: >>> >>>> On 1/1/26 12:42 PM, David Laight wrote: >>>> >>>> Hello David, >>>> >>>>>> static void rcar_mipi_dsi_pll_calc(struct rcar_mipi_dsi *dsi, >>>>>> unsigned long fin_rate, >>>>>> unsigned long fout_target, >>>>>> - struct dsi_setup_info *setup_info) >>>>>> + struct dsi_setup_info *setup_info, >>>>>> + u16 vclk_divider) >>>>> >>>>> There is no need for this to be u16, unsigned int will generate better >>>>> code. >>>> >>>> Can you please elaborate on the "better code" part ? >>> >>> Basically the compiler has to generate extra code to ensure the value >>> doesn't exceed 65535. >>> So there will be a mask of the function parameter (passes in a 32bit >>> register). >>> Any arithmetic needs similar masking of the result. >>> You won't see the latter (as much) on x86 (or m68k) because the compiler >>> can use the C 'as if' rule and use the cpu's 8/16 bit registers and alu. >>> But pretty much no other cpu can do that, so extra instructions are needed >>> to stop the value (in a 32bit register) exceeding the limit for the >>> variable. >>> >>> Remember that while C variables can be char/short the values they contain >>> are promoted to 'signed int' before (pretty much) anything is done with >>> the value, any calculated value is then truncated before being assigned >>> back. >>> For on-stack values this is fine - but modern compilers was to keep values >>> in registers as much as possible. >>> >>>> >>>>>> { >>>>>> unsigned int best_err = -1; >>>>>> const struct rcar_mipi_dsi_device_info *info = dsi->info; >>>>>> @@ -360,7 +373,7 @@ static void rcar_mipi_dsi_pll_calc(struct >>>>>> rcar_mipi_dsi *dsi, >>>>>> if (fout < info->fout_min || fout > >>>>>> info->fout_max) >>>>>> continue; >>>>>> >>>>>> - fout = div64_u64(fout, >>>>>> setup_info->vclk_divider); >>>>>> + fout = div64_u64(fout, vclk_divider); >>>>> >>>>> Since vclk_divider is BIT_U32(div [+ 1]) this is just a shift right. >>>>> So pass the bit number instead. >>>> >>>> While I agree this is a shift in the end, it also makes the code harder >>>> to understand. I can add the following change into V2, but I would like >>>> input from Tomi/Laurent on whether we should really push the >>>> optimization in that direction: >>> >>> The shift really is a lot faster. >>> Even on 64bit x86 cpus it can be slow - 35-88 clocks prior to 'Cannon Lake'. >>> AMD cpu get better with Zen3, and using a 64bit divide with 32bits values >>> doesn't slow things down (it does on the Intel cpu). >>> Don't even think about what happens on 32bit cpus. >>> I don't know about other architectures. >>> Just comment as 'x >> n' rather than 'x / (1 << n)' >> >> Please note that this code is built primarily for arm64 , so discussing >> x86 or m68k optimizations doesn't make much sense here. > > ARM definitely only has 32 and 64bit maths - so you will see masking > instructions for arithmetic on char/short values (unless the compiler > can tell that the values cannot get too large.) > > Divide performance is cpu dependant - the only arm cpu I've used only > had software divide (but a fast barrel shifter). > If you think that Intel haven't thrown much silicon at integer divide > it isn't that likely that arm will have a divide that is much faster > than 1 clock for each bit in the quotient. > (Of course I might be wrong...) The division is done once when enabling the display? If so, I'd prioritize readability. That said, division done with shift if quite common, so I'm not sure if it would be that bad with a few comments.
Tomi
