On Fri, 2 Jan 2026 00:25:54 +0100
Marek Vasut <[email protected]> wrote:

> On 1/2/26 12:13 AM, David Laight wrote:
> > On Thu, 1 Jan 2026 22:26:34 +0100
> > Marek Vasut <[email protected]> wrote:
> >   
> >> On 1/1/26 12:42 PM, David Laight wrote:
> >>
> >> Hello David,
> >>  
> >>>>    static void rcar_mipi_dsi_pll_calc(struct rcar_mipi_dsi *dsi,
> >>>>                                     unsigned long fin_rate,
> >>>>                                     unsigned long fout_target,
> >>>> -                                   struct dsi_setup_info *setup_info)
> >>>> +                                   struct dsi_setup_info *setup_info,
> >>>> +                                   u16 vclk_divider)  
> >>>
> >>> There is no need for this to be u16, unsigned int will generate better 
> >>> code.  
> >>
> >> Can you please elaborate on the "better code" part ?  
> > 
> > Basically the compiler has to generate extra code to ensure the value
> > doesn't exceed 65535.
> > So there will be a mask of the function parameter (passes in a 32bit 
> > register).
> > Any arithmetic needs similar masking of the result.
> > You won't see the latter (as much) on x86 (or m68k) because the compiler
> > can use the C 'as if' rule and use the cpu's 8/16 bit registers and alu.
> > But pretty much no other cpu can do that, so extra instructions are needed
> > to stop the value (in a 32bit register) exceeding the limit for the 
> > variable.
> > 
> > Remember that while C variables can be char/short the values they contain
> > are promoted to 'signed int' before (pretty much) anything is done with
> > the value, any calculated value is then truncated before being assigned 
> > back.
> > For on-stack values this is fine - but modern compilers was to keep values
> > in registers as much as possible.
> >   
> >>  
> >>>>    {
> >>>>          unsigned int best_err = -1;
> >>>>          const struct rcar_mipi_dsi_device_info *info = dsi->info;
> >>>> @@ -360,7 +373,7 @@ static void rcar_mipi_dsi_pll_calc(struct 
> >>>> rcar_mipi_dsi *dsi,
> >>>>                          if (fout < info->fout_min || fout > 
> >>>> info->fout_max)
> >>>>                                  continue;
> >>>>    
> >>>> -                        fout = div64_u64(fout, 
> >>>> setup_info->vclk_divider);
> >>>> +                        fout = div64_u64(fout, vclk_divider);  
> >>>
> >>> Since vclk_divider is BIT_U32(div [+ 1]) this is just a shift right.
> >>> So pass the bit number instead.  
> >>
> >> While I agree this is a shift in the end, it also makes the code harder
> >> to understand. I can add the following change into V2, but I would like
> >> input from Tomi/Laurent on whether we should really push the
> >> optimization in that direction:  
> > 
> > The shift really is a lot faster.
> > Even on 64bit x86 cpus it can be slow - 35-88 clocks prior to 'Cannon Lake'.
> > AMD cpu get better with Zen3, and using a 64bit divide with 32bits values
> > doesn't slow things down (it does on the Intel cpu).
> > Don't even think about what happens on 32bit cpus.
> > I don't know about other architectures.
> > Just comment as 'x >> n' rather than 'x / (1 << n)'
>
> Please note that this code is built primarily for arm64 , so discussing 
> x86 or m68k optimizations doesn't make much sense here.

ARM definitely only has 32 and 64bit maths - so you will see masking
instructions for arithmetic on char/short values (unless the compiler
can tell that the values cannot get too large.)

Divide performance is cpu dependant - the only arm cpu I've used only
had software divide (but a fast barrel shifter).
If you think that Intel haven't thrown much silicon at integer divide
it isn't that likely that arm will have a divide that is much faster
than 1 clock for each bit in the quotient.
(Of course I might be wrong...)

        David

Reply via email to