Hi Jeff,

Thanks for the review!

On Sun, Dec 15, 2024 at 2:59 AM Jeff Law <jeffreya...@gmail.com> wrote:
> If your integer divider has early exit paths you may want to reduce the
> int_div costs a bit.    I found that ~75% of the actual latency as the
> cost worked pretty well for our uarch.  Obviously this is a heuristic
> and there's no perfect value.

The divider does have early out conditions and I was struggling to pick a
value. I've reduced the cost by a few cycles.

> So you've marked as not having any fusion capability.  That would
> suggest to me quite strongly that you should be using divmod expansion.
>
> Essentially divmod expansion exposes a pattern which produces the
> quotient & remainder outputs using a single div + mult + sub which is
> almost always going to be faster than a div and a mod instruction.
>
> In the case where you don't need the remainder the mult/sub will get
> trivially removed as dead code.  In the case where you don't need the
> quotient the sequenece will be transformed back into a single rem
> instruction later in the RTL passes (probably combine).

Nice catch, it sounds like we do want the divmod expansion. I also realised
I haven't updated gcc/doc/invoke.texi so will do that.

Thanks,
Anton


>
> If your processor has fusion capabilities, you might want to look at if
> they map to the ones currently supported and if so set the right bits
> for fusible ops.  If there's cases missing that your processor supports,
> then we should probably work together as I've got an engineer that's
> expanded the set of fusible cases in our internal gcc tree that I can
> make available (just haven't had the time to work through the internal
> review process yet).
>
> jeff

Reply via email to