On Thu, 1 Jan 2026 22:26:34 +0100
Marek Vasut <[email protected]> wrote:

> On 1/1/26 12:42 PM, David Laight wrote:
> 
> Hello David,
> 
> >>   static void rcar_mipi_dsi_pll_calc(struct rcar_mipi_dsi *dsi,
> >>                               unsigned long fin_rate,
> >>                               unsigned long fout_target,
> >> -                             struct dsi_setup_info *setup_info)
> >> +                             struct dsi_setup_info *setup_info,
> >> +                             u16 vclk_divider)  
> > 
> > There is no need for this to be u16, unsigned int will generate better 
> > code.  
> 
> Can you please elaborate on the "better code" part ?

Basically the compiler has to generate extra code to ensure the value
doesn't exceed 65535.
So there will be a mask of the function parameter (passes in a 32bit register).
Any arithmetic needs similar masking of the result.
You won't see the latter (as much) on x86 (or m68k) because the compiler
can use the C 'as if' rule and use the cpu's 8/16 bit registers and alu.
But pretty much no other cpu can do that, so extra instructions are needed
to stop the value (in a 32bit register) exceeding the limit for the variable.

Remember that while C variables can be char/short the values they contain
are promoted to 'signed int' before (pretty much) anything is done with
the value, any calculated value is then truncated before being assigned back.
For on-stack values this is fine - but modern compilers was to keep values
in registers as much as possible.

> 
> >>   {
> >>    unsigned int best_err = -1;
> >>    const struct rcar_mipi_dsi_device_info *info = dsi->info;
> >> @@ -360,7 +373,7 @@ static void rcar_mipi_dsi_pll_calc(struct 
> >> rcar_mipi_dsi *dsi,
> >>                    if (fout < info->fout_min || fout > info->fout_max)
> >>                            continue;
> >>   
> >> -                  fout = div64_u64(fout, setup_info->vclk_divider);
> >> +                  fout = div64_u64(fout, vclk_divider);  
> > 
> > Since vclk_divider is BIT_U32(div [+ 1]) this is just a shift right.
> > So pass the bit number instead.  
> 
> While I agree this is a shift in the end, it also makes the code harder 
> to understand. I can add the following change into V2, but I would like 
> input from Tomi/Laurent on whether we should really push the 
> optimization in that direction:

The shift really is a lot faster.
Even on 64bit x86 cpus it can be slow - 35-88 clocks prior to 'Cannon Lake'.
AMD cpu get better with Zen3, and using a 64bit divide with 32bits values
doesn't slow things down (it does on the Intel cpu).
Don't even think about what happens on 32bit cpus.
I don't know about other architectures.
Just comment as 'x >> n' rather than 'x / (1 << n)'

        David

Reply via email to