https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78020
ktkachov at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Keywords| |wrong-code Last reconfirmed| |2016-10-18 CC| |ktkachov at gcc dot gnu.org Ever confirmed|0 |1 Summary|[Aarch64, ARM64] |[AArch64] vuzp{1,2}q_f64 |vuzp{1,2}q_f64 |implementation identical to |implementation identical to |vzip{1,2}q_f64 in |vzip{1,2}q_f64 in |arm_neon.h and probably |arm_neon.h and probably |incorrect |incorrect | Target Milestone|--- |7.0 Known to fail| |4.8.5, 4.9.4, 5.3.1, 6.0, | |7.0 --- Comment #1 from ktkachov at gcc dot gnu.org --- I think you're right after reading the ARM ARM (the same version as you) I suppose the minimal testcase is something like: #include "arm_neon.h" double amem[] = {1.0, 2.0}; double bmem[] = {3.0, 4.0}; int main (void) { float64x2_t a = vld1q_f64 (amem); float64x2_t b = vld1q_f64 (bmem); float64x2_t res = vuzp1q_f64 (a, b); if (vgetq_lane_f64 (res, 0) != 3.0 || vgetq_lane_f64 (res, 1) != 1.0) __builtin_abort (); return 0; } I guess it can a bit fiddly to get right because in the instruction: UZP1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T> the 'zipped' vector that we unzip is Vm:Vn i.e. Vm goes first and so the first element of the unzipped vector is from Vm