On Thu, Sep 1, 2016 at 4:11 AM, Niall Douglas <[email protected]> wrote:
> Hi all, > > We've been using ISPC to generate optimised implementations of various > math routines to superb effect, typically beating our hand written > intrinsic editions by 5-10%. So firstly many thanks! > Woo! > We've seen an odd code generation pattern in the ARM NEON generated by > ISPC however: > [...] > It looks like reduce_add() causes the NEON LLVM to generate a > non-inlineable add_f32 function. Is there some good reason that this LLVM > IR isn't marked alwaysinline? > Not that I can recall, and not that I can see from reviewing the code now. More generally, I think(?) that just about all of the functions in builtins/target-* should be marked as alwaysinline; stuff like __half_to_float_uniform also deserves that treatment. As I look through the code for other backends, the 'alwaysinline' stuff is similarly somewhat inconsistent. I assume that most of the time LLVM just inlines the simple stuff anyway, but it'd be nice to make sure there aren't other performance bugs like that one. Any chance you could make the changes (for NEON at least), make sure things still work, and submit a pull request? Thanks, Matt -- You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
