On Thu, Jan 29, 2026 at 11:37:33AM +0200, Andy Shevchenko wrote:
> On Thu, Jan 29, 2026 at 02:58:52AM +0200, Cristian Ciocaltea wrote:
...
> > +#define __DRM_ARGB64_PREP_BPC(c, shift, bpc)({
> > \
> > + __u16 mask = __GENMASK((bpc) - 1, 0); \
> > + __u16 conv = __KERNEL_DIV_ROUND_CLOSEST((mask & (c)) * \
> > + __GENMASK(15, 0), mask);\
>
> The whole point of the first patch is to use it in the divisions by 2^n - 1.
> Can we transform this to make it "divisions" by power-of-two?
>
> ...: def dbm2(c, bpc):
> ...: m = (1 << bpc) - 1
> ...: c1 = m & c
> ...: r = c1 << (16 - bpc)
> ...: for i in range(1, 16 // bpc):
> ...: r = r + (c1 << (16 - (i + 1) * bpc))
> ...: return r
I noticed that on some inputs it gives off-by-small-number error.
But you got the idea.
Taking this into account, perhaps we can share __KERNEL_DIV_ROUND_CLOSEST()
anyway and leave it there and improve the situation later on. Up to DRM
maintainers.
> The above is a Python version of PoC of this approximation. Basically
> we transform the fraction X / (2^n - 1) to a chained version of
> X / 2^n + X / 2^2n + ... X / 2^kn as derived from recurrent formula
> of i+1:th iteration as Xi+1 = Xi / 2^n + Xi / (2^n * (2^n - 1)).
>
> So, maybe that one should be used instead? (It may be thought through
> on how to collapse the for-loop to maybe some bitops, but even with
> a for-loop it might be faster than real division.)
>
> Note, we have some (for sure more than one, I remember the same Q appeared to
> me a few years ago) of the examples which may avoid division at all. I would
> like to have this macro to be kernel wide (and UAPI seems also okay).
> > + __DRM_ARGB64_PREP(conv, shift); \
> > +})
--
With Best Regards,
Andy Shevchenko