Hi,
I've been working on a few different implementations of a simple ray tracer 
where one is implemented using intrinsics and one using ISPC. Long story 
short, the intrinsics version outperforms the ISPC one and it appears to be 
the ray/aabb intersection test that is slower. I did a comparison with 
Compiler Explorer and one difference I noticed is that ISPC emits vcmpleps 
+ vblendvps as opposed to vminps/vmaxps (see https://godbolt.org/z/ZD7Vpr 
for details). I have yet to determine with absolute certainty that this 
specific difference between the intrinsics version and the one written in 
ISPC is causing the performance difference but it seems a resonable cause 
given that the former ends up having less instructions than the latter.

1. Is there a specific reason why ISPC emits vcmpleps + vblendvps instead 
of vminps/vmaxps?
2. Is there something I can do, be it provide a specific compiler flag or 
write the intersection routine differently, to make ISPC emit vminps/vmaxps 
instructions?

Thanks
Michael

-- 
You received this message because you are subscribed to the Google Groups 
"Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ispc-users/d86e5c0a-dd18-4a50-9517-9924a8056dc5%40googlegroups.com.

Reply via email to