Hi Ana, I test your patch on trunk, but have 1 failure in regression test. Can you make a check of your patch?
FAIL: Clang :: CodeGen/aarch64-neon-intrinsics.c (2056 of 15433) ******************** TEST 'Clang :: CodeGen/aarch64-neon-intrinsics.c' FAILED ******************** Script: -- /home/kevin/llvm_trunk/build/bin/./clang -cc1 -internal-isystem /home/kevin/llvm_trunk/build/bin/../lib/clang/3.4/include -triple aarch64-none-linux-gnu -target-feature +neon -ffp-contract=fast -S -O2 -o - /home/kevin/llvm_trunk/llvm/tools/clang/test/CodeGen/aarch64-neon-intrinsics.c | FileCheck /home/kevin/llvm_trunk/llvm/tools/clang/test/CodeGen/aarch64-neon-intrinsics.c -- Exit Code: 1 Command Output (stderr): -- clang: /home/kevin/llvm_trunk/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:922: llvm::SDValue llvm::DAGTypeLegalizer::PromoteIntOp_BUILD_VECTOR(llvm::SDNode*): Assertion `!(NumElts & 1) && "Legal vector of one illegal element?"' failed. 2013/9/18 Ana Pazos <[email protected]> > Hi folks, > > I have rebased my patches now that dependent pending patches are merged. > > I also have made these additional changes: > > 1) Adopted the v1ix and v1if solution. > I will revisit it when the "global instruction selection" is in place. > > Tim, can you talk more about this upcoming LLVM change? > a) Will it still be SelectionDAG based? > b) How having whole function knowledge will help me distinguish > when > to create Integer and scalar Neon operations without adding the v1x and v1f > types? > > > 2) Introduced a new operator OP_SCALAR_ALIAS to allow creating AArch64 > scalar intrinsics that are alias to legacy ARM intrinisics. > > Example: > __ai int64_t vaddd_s64(int64_t __a, int64_t __b) { > return (int64_t)vadd_s64((int64x1_t)__a, (int64x1_t)__b); } > > Note that even with this change, the AArch64 intrinisc vaddd_s64 will NOT > generate "add d0, d1, d0" but the optimized code "add x0, x1, x0" because > of > the castings to in64_t. > > I experimented with compiling the aarch64-neon-intrinsics.c with -O0 > instead > of -O3, but instruction combining pass still makes this optimization. > > So we are really dependent on the compiler optimizations here. > > But note that directly calling ARM legacy intrinsic vadd_s64 produces "add > d0, d1, d0", since the inputs are v1i64 type and I have the proper > instruction selection pattern defined. > > > 3) Got rid of int_aarch64_sisd_add(u,s)64 and int_aarch64_sisd_add(u,s)64 > intrinsics, as a side-effect of implementing (2). > > Removing these intrinsics we cannot guarantee vaddd_(s,u)64 and > vsubd_(s,u)64 will produce "add/sub d0, d1, d0". > I am allowing these intrinsics to generate Integer code, which is the best > implementation of these intrinsics, as Tim pointed out. > I updated the tests accordingly. > > > 4) Used FMOV instead of UMOV to move registers from Neon/integer units when > possible > > For types of size 32 and 64 I tried to make use of FMOV instructions. For > types of size 8 and 16, I make use of the UMOV instructions. > > > Let me know if you have any more comments on these patches. > > Thanks, > Ana. > > -----Original Message----- > From: Tim Northover [mailto:[email protected]] > Sent: Friday, September 13, 2013 2:02 AM > To: Kevin Qin > Cc: Ana Pazos; [email protected]; llvm-commits; [email protected] > Subject: Re: [PATCH][AArch64]RE: patches with initial implementation of > Neon > scalar instructions > > Hi Kevin, > > > From my perspective, DAG should only hold operations with value type, > > but not a certain register class. Which register class to be used is > > decided by compiler after some cost calculation. If we bind v1i32 and > > v1i64 to FPR, then it's hard for compiler to make this optimization. > > In an ideal world, I completely agree. Unfortunately the SelectionDAG > infrastructure just doesn't make these choices intelligently. It looks at > each node in isolation and chooses an instruction based on the types > involved. If there were two "(add i64:$Rn, i64:$Rm)" patterns then only one > of them would ever match. > > I view this v1iN nonsense as an unfortunate but necessary temporary > measure, > until we get our global instruction selection. > > I think the only way you could get LLVM to produce both an "add x, x, x" > and > an "add d, d, d" from sensible IR without it would be a separate > (MachineInstr) pass which goes through afterwards and patches things up. > > The number of actually duplicated instructions is small enough that this > might be practical, but it would have its own ugliness even if it worked > flawlessly (why v1i8, v1i16 but i32 and i64? There's a good reason, but > it's > not pretty). > > I'm not implacably opposed to the approach, but I think you'd find > implementing it quite a bit of work. Basically, the main thing I want to > avoid is an int_aarch64_sisd_add intrinsic. That seems like it's the worst > of all possible worlds. > > Cheers. > > Tim. > > _______________________________________________ > cfe-commits mailing list > [email protected] > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits > > -- Best Regards, Kevin Qin
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
