On Thu, Sep 1, 2016 at 4:11 AM, Niall Douglas <[email protected]>
wrote:

> Hi all,
>
> We've been using ISPC to generate optimised implementations of various
> math routines to superb effect, typically beating our hand written
> intrinsic editions by 5-10%. So firstly many thanks!
>

Woo!


> We've seen an odd code generation pattern in the ARM NEON generated by
> ISPC however:
>
[...]

> It looks like reduce_add() causes the NEON LLVM to generate a
> non-inlineable add_f32 function. Is there some good reason that this LLVM
> IR isn't marked alwaysinline?
>

Not that I can recall, and not that I can see from reviewing the code now.
More generally, I think(?) that just about all of the functions in
builtins/target-* should be marked as alwaysinline; stuff like
__half_to_float_uniform also deserves that treatment. As I look through the
code for other backends, the 'alwaysinline' stuff is similarly somewhat
inconsistent. I assume that most of the time LLVM just inlines the simple
stuff anyway, but it'd be nice to make sure there aren't other performance
bugs like that one.

Any chance you could make the changes (for NEON at least), make sure things
still work, and submit a pull request?

Thanks,
Matt

-- 
You received this message because you are subscribed to the Google Groups 
"Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to