Re: [AARCH64] Neon vld1__x3, vst1__x2 and vst1_*_x3 intrinsics

James Greenhalgh Fri, 13 Apr 2018 07:52:37 -0700

On Fri, Apr 13, 2018 at 03:39:32PM +0100, Sameera Deshpande wrote:
> On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, 
> <james.greenha...@arm.com<mailto:james.greenha...@arm.com>> wrote:
> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
> > Hi,
> >
> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande 
> > <sameera.deshpa...@linaro.org<mailto:sameera.deshpa...@linaro.org>>:
> > > Hi Christophe,
> > >
> > > Please find attached the updated patch with testcases.
> > >
> > > Ok for trunk?
> >
> > Thanks for the update.
> >
> > Since the new intrinsics are only available on aarch64, you want to
> > prevent the tests from running on arm.
> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two 
> > targets.
> > There are several examples on how to do that in that directory.
> >
> > I have also noticed that the tests fail at execution on aarch64_be.
> 
> I think this is important to fix. We don't want the big-endian target to have
> failing implementations of the Neon intrinsics. What is the nature of the
> failure?
> 
> From what I can see, nothing in the patch prevents using these intrinsics
> on big-endian, so either the intrinsics behaviour is wrong (we have a wrong
> code bug), or the testcase expected behaviour is wrong.
> 
> I don't think disabling the test for big-endian is the right fix. We should
> either fix the intrinsics, or fix the testcase.
> 
> Thanks,
> James
> 
> Hi James,
> 
> As the tests assume the little endian order of elements while checking the
> results, the tests are failing for big endian targets. So, the failures are
> not because of intrinsic implementations, but because of the testcase.


The testcase is a little hard to follow through the macros, but why would
this be the case?

ld1 is deterministic on big and little endian for which elements will be
loaded from memory, as is st1.

My expectation would be that:

  int __attribute__ ((noinline))
  test_vld_u16_x3 ()
  {
    uint16_t data[3 * 3];
    uint16_t temp[3 * 3];
    uint16x4x3_t vectors;
    int i,j;
    for (i = 0; i < 3 * 3; i++)
      data [i] = (uint16_t) 3*i;
    asm volatile ("" : : : "memory");
    vectors = vld1_u16_x3 (data);
    vst1_u16 (temp, vectors.val[0]);
    vst1_u16 (&temp[3], vectors.val[1]);
    vst1_u16 (&temp[3 * 2], vectors.val[2]);
    asm volatile ("" : : : "memory");
    for (j = 0; j < 3 * 3; j++)
      if (temp[j] != data[j])
        return 1;
    return 0;
  }

would work equally well for big- or little-endian.

I think this is more likely to be an intrinsics implementation bug.

Thanks,
James

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

Reply via email to

Re: [AARCH64] Neon vld1__x3, vst1__x2 and vst1_*_x3 intrinsics