Re: RISC-V: Add divmod instruction support

Jeff Law via Gcc-patches Sat, 18 Feb 2023 13:57:34 -0800



On 2/18/23 14:30, Palmer Dabbelt wrote:

On Sat, 18 Feb 2023 13:06:02 PST (-0800), jeffreya...@gmail.com wrote:



On 2/18/23 11:26, Palmer Dabbelt wrote:

On Fri, 17 Feb 2023 06:02:40 PST (-0800), gcc-patches@gcc.gnu.org wrote:

Hi all,
If we have division and remainder calculations with the same operands:

  a = b / c;
  d = b % c;

We can replace the calculation of remainder with multiplication +
subtraction, using the result from the previous division:

  a = b / c;
  d = a * c;
  d = b - d;

Which will be faster.


Do you have any benchmarks that show that performance increase?  The ISA
manual specifically says the suggested sequence is div+mod, and while
those suggestions don't always pan out for real hardware it's likely
that at least some implementations will end up with the ISA-suggested
fusions.

It'll almost certainly be visible in mcf.  Been there, done that.  In
fact, that's why I asked the team Matevos works on to poke at this case
as I went through this issue on another processor.

It can also be run through LLVM's MCA to estimate counts if you've got a
pipeline description.  THe div+rem will come out at around ~40c while a
div+mul+sub should weigh in around 25c for Veyron v1.

Do you have a link to the patches somewhere? I couldn't find themonline, just the custom instruction support. Or even just some docsdescribing what the pipeline does, as just basing one performance modelon another is kind of a double-edged sword.

It is. But div/rem is pretty simple. 20c each, not pipelined, using ashared unit. There's some early out paths, but the compiler isn't goingto be able to model those as they depend on the number of bits on in theinputs. Basically as long as we can do a mult+sub in < 20c, Matevos'ssequence is faster.

If we have implementations that support fusion at some point, then wecan twiddle the expander appropriately. Similarly we could easilyconsider selecting on -Os as well since div+rem is smaller thandiv+mul+sub. I'm sure Matevos is open to adjustments to that patch.

We haven't done a full eval on the pipeline modeling yet and with gcc instage4, it didn't seem advisable to try and push it through. SimilarlyI don't think Matevos's patch should really be a gcc-13 thing, it reallyshould be gcc-14.

That said, I think just knowing the processor doesn't do the div+modfusion is sufficient to turn something like this on for the mtune forthat processor. That's different than turning it on globally, though --unless it turns out nobody is actually doing the fusion suggested in theISA manual, which wouldn't be super surprising.

I'm not aware of anyone doing fusion of divmod in the risc-v space.

For prior ports I've worked on, the hardware folks made is painfullyclear that the cost of adding another output port on the unit was anon-starter. That port had a pretty fast divider with at least someoverlap and the div + mul + sub sequence was still better in general,though the early out cases made it much harder to evaluate.

Maybe some of the SiFive and T-Head folks can chime in on whether or nottheir processors perform the fusion in question -- and if so, do theinstructions need to say back-to-back? It doesn't look like we'rereally targeting the code sequences the ISA suggests as it stands, somaybe it's OK to just switch the default over too?

Happy to take in their input. I suspect they'll ultimately prefer thesequence Matevos is generating.

It also brings up the question of mulh+mul fusions, which I don't thinkwe've really looked at (though maybe they're a lot less important forrv64).

Not on our radar for V1 or V2.
jeff

Re: RISC-V: Add divmod instruction support

Reply via email to