On 02/15/16 04:50, James Greenhalgh wrote:
On Mon, Feb 08, 2016 at 10:57:10AM +0000, James Greenhalgh wrote:
On Mon, Feb 01, 2016 at 02:00:01PM +0000, James Greenhalgh wrote:
On Mon, Jan 25, 2016 at 11:20:46AM +0000, James Greenhalgh wrote:
On Mon, Jan 11, 2016 at 12:04:43PM +0000, James Greenhalgh wrote:
Hi,

I've seen a couple of large performance issues caused by expanding
the high-precision reciprocal square root for Cortex-A57, so I'd like
to turn it off by default.

This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from
Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for
some private microbenchmark kernels which stress the divide/sqrt/multiply
units. It therefore seems to me to be the correct choice to make across
a number of workloads.

Bootstrapped and tested on aarch64-none-linux-gnu with no issues.

OK?
*Ping*
*pingx2*
*ping^3*
*ping^4*

Thanks,
James

---
2015-12-11  James Greenhalgh  <james.greenha...@arm.com>

        * config/aarch64/aarch64.c (cortexa57_tunings): Remove
        AARCH64_EXTRA_TUNE_RECIP_SQRT.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1d5d898..999c9fc 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings =
    0,  /* max_case_values.  */
    0,  /* cache_line_size.  */
    tune_params::AUTOPREFETCHER_WEAK,   /* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS
-   | AARCH64_EXTRA_TUNE_RECIP_SQRT)    /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags.  */
  };
static const struct tune_params cortexa72_tunings =


James,

There seem to be SPEC CPU2000fp validation issues on A57 when this flag is present too. Though I evaluated the algorithm with a huge random set of values, always delivering accuracy around 1ulp, which should be enough for CPU2000fp (wit x86-64), I expected the benchmarks to pass.

My suspicion is that the Newton series on AArch64 is probably good only for SP. Then, DP might require an extra round, probably exacerbating the performance penalty.

I'd like to try to split this tuning option into one for SP and another for DP. Thoughts?

Thank you,

--
Evandro Menezes

Reply via email to