Hi Ana, > I was just pointing out that if I define the ARMv8 intrinsic using the legacy > ARMv7 intrinsic produces code like this: > > (int64_t) vadd_s64((int64x1_t(a), int64x1_t(b)) > > which results in "add x0, x1, x0".
Yep, though not in all cases. > Now we need to confirm what is the expected implementation for a Neon > intrinisic - to produce only Neon code or produce best code possible? That's definitely the question (or something very close). I think it should be the latter. Otherwise people had just as well be using inline assembly. Think about some more extreme cases: if someone writes vadd(a, vmul_f32(b, c)) should we be forced to emit two instructions rather than a (non-fused) vmla.f32? Or what if someone writes a loop that we can remove completely. Should we blindly emit it because they asked for a bunch of NEON instructions? And if you allow LLVM to optimise those examples, the question becomes where to draw the line. The only sensible answer (I think) is "when LLVM thinks it'll make the code better". > The spreadsheet I have with AArch64 intrinsics definitions shows > Neon instruction is expected: I view the spreadsheet as providing semantics. Yes these are NEON intrinsics, so they're going to provide at least one way of producing the actions of every instruction. And of course they're going to tell you what the effect should be in terms of those NEON instructions. It's the easiest way. Anyway, those are just my views on the topic. Tim. _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
