Hi, I've been working on a few different implementations of a simple ray tracer where one is implemented using intrinsics and one using ISPC. Long story short, the intrinsics version outperforms the ISPC one and it appears to be the ray/aabb intersection test that is slower. I did a comparison with Compiler Explorer and one difference I noticed is that ISPC emits vcmpleps + vblendvps as opposed to vminps/vmaxps (see https://godbolt.org/z/ZD7Vpr for details). I have yet to determine with absolute certainty that this specific difference between the intrinsics version and the one written in ISPC is causing the performance difference but it seems a resonable cause given that the former ends up having less instructions than the latter.
1. Is there a specific reason why ISPC emits vcmpleps + vblendvps instead of vminps/vmaxps? 2. Is there something I can do, be it provide a specific compiler flag or write the intersection routine differently, to make ISPC emit vminps/vmaxps instructions? Thanks Michael -- You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ispc-users/d86e5c0a-dd18-4a50-9517-9924a8056dc5%40googlegroups.com.
