On 2023/9/6 8:31, Jeff Law wrote:


On 9/3/23 22:49, Lehua Ding wrote:
This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.

gcc/ChangeLog:

    * config/riscv/autovec-opt.md (*cond_<optab><mode>):
    Add sqrt + vcond_mask combine pattern.
    * config/riscv/autovec.md (<optab><mode>2):
    Change define_expand to define_insn_and_split.

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
    * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
    * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
    * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
OK.  Thanks.

FWIW, I thought we only had the reciprocal sqrt estimator, but in fact rvv does define a real vector sqrt.   So the concerns we kicked around in the meeting this morning turned out not be warranted.

This raises one of the very interesting questions in this space, specifically whether or not we should be using the rsqrt estimator with correction steps.   Unless the vfsqrt latency is really bad, it's going to be hard to make a vfrsqrt7 based sequence faster -- but the vfrsqrt7 sequences will be pipelinable while vfsqrt almost certainly isn't.

Sadly we don't have a scalar FP rsqrt estimator.  Though I certainly ponder using the vector one -- there's a neat trick you can do with the nab benchmark from spec and produce sqrt and rsqrt at the same time with a Goldschmidt sequence.  It requires a bit of hackery to make new tree nodes, but it was definitely worth it on other targets I've worked on.

Committed, thank Jeff.

--
Best,
Lehua

Reply via email to