Hi Tim and Jiangning,
The patches bring up a couple discussion points: 1) Type of code generated by ACLE Neon intrinsics >From what I have experimented with, to guarantee only Neon code is generated for the ACLE Neon intrisics, you will need to use builtins and translate those builtins into LLVM intrinsics. Otherwise you are vulnerable to the compiler capabilities (e.g., current/future optimizations, data layout changes) and might not generate the expected Neon instructions. If this is not a requirement, than the way we generate tests for ACLE Neon intrinsics in NeonCodeEmitter needs to be fixed. We cannot auto generate "//CHECK" strings with the Neon instructions. 2) Using v1ix and v1fx to represent Neon scalar data types in the backend. This is the important decision we need to make soon. ARMv8 supports 64, 32, 16 and 8 bit scalar operations in Neon. I think the compiler should be able to distinguish when to generate Neon scalar from non-Neon scalar operations. How to achieve that without defining different data types? Only through using Neon intrinsics? Regarding impact on middle end optimizations effectiveness: This is my understanding. Tim and others, correct me if I got it wrong. The data layout string defined for AArch64 only contains 32 and 64 as native types. See AArch64TargetInfo::DescriptionString in tools\clang\lib\Basic\Targets.cpp: n32:64 The middle end uses this data layout information to perform the optimizations. Right now it promotes sub-word data types to 32-bit. You can see the generation of "sext" IR operations when you emit LLVM code. I do not see it doing sub-word optimizations. If this data layout is in the future changed to n8:16:32:64 and we use ixx and fxx for Neon scalar types, we will have more mix of Neon and Non-neon code, more copy operations between Neon and Non-neon registers which can have a bad impact on performance. Hope Tim and the community can give me some more guidance in this area. Thanks, Ana. From: Jiangning Liu [mailto:[email protected]] Sent: Wednesday, September 11, 2013 11:32 PM To: Ana Pazos Cc: Tim Northover; [email protected]; llvm-commits; [email protected] Subject: Re: [PATCH][AArch64]RE: patches with initial implementation of Neon scalar instructions Ana, I personally think acle functions for neon should be expected to generate neon instruction, because it would be able to ask compiler to generate special instructions supporting complex functionality. The test case given by Tim should be able to still generate "add d0, d1, d0", if you define vaddd_s64 using vadd_s64, rather than using an IR intrinsic. Since most of middle end optimizations are based on scalar data type, if we use v1ixx instead of ixx, do we have any scenario to lose optimization opportunities in middle end? Or we don't care about that at all, because this is being introduced by acle intrinsics. I'm also fine with this conclusion. Thanks, -Jiangning
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
