Hi Jeff, Thanks for the review!
On Sun, Dec 15, 2024 at 2:59 AM Jeff Law <jeffreya...@gmail.com> wrote: > If your integer divider has early exit paths you may want to reduce the > int_div costs a bit. I found that ~75% of the actual latency as the > cost worked pretty well for our uarch. Obviously this is a heuristic > and there's no perfect value. The divider does have early out conditions and I was struggling to pick a value. I've reduced the cost by a few cycles. > So you've marked as not having any fusion capability. That would > suggest to me quite strongly that you should be using divmod expansion. > > Essentially divmod expansion exposes a pattern which produces the > quotient & remainder outputs using a single div + mult + sub which is > almost always going to be faster than a div and a mod instruction. > > In the case where you don't need the remainder the mult/sub will get > trivially removed as dead code. In the case where you don't need the > quotient the sequenece will be transformed back into a single rem > instruction later in the RTL passes (probably combine). Nice catch, it sounds like we do want the divmod expansion. I also realised I haven't updated gcc/doc/invoke.texi so will do that. Thanks, Anton > > If your processor has fusion capabilities, you might want to look at if > they map to the ones currently supported and if so set the right bits > for fusible ops. If there's cases missing that your processor supports, > then we should probably work together as I've got an engineer that's > expanded the set of fusible cases in our internal gcc tree that I can > make available (just haven't had the time to work through the internal > review process yet). > > jeff