Issue 97580
Summary [X86] Vector-Vector dot product not reduced to corresponding single instruction
Labels new issue
Assignees
Reporter Hendiadyoin1
    Given the following cpp code snippets:
```c++
float simple_dot_product(f32x4 a, f32x4 b) {
 return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
}

f32x4 dot_product_broadcast(f32x4 a, f32x4 b) {
 float d = a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
    f32x4 r = {d,d,d,d};
    return r;
}

float selective_dot_product(f32x4 a, f32x4 b) {
    return a[0] * b[0] + a[2] * b[2] + a[3] * b[3];
}

f32x4 selective_dot_product_selective_broadcast(f32x4 a, f32x4 b) {
    float d = a[0] * b[0] + a[2] * b[2] + a[3] * b[3];
 f32x4 r = {d,d,0,d};
    return r;
}
```

clang/llvm fails to reduce these down to simple `dpps` (`DotProductPackedSingles`) instructions when SSE4.2 is enabled, similar might be true for the `double` case

Godbolt link with hopefully correct targets:
https://godbolt.org/z/od5ezWM19

Note that this might be affected by fp-accuracy affecting flags, such as `-fassociative-math` or `-ffp-contract=*`, as using the dot product instruction might yield higher accuracy (taking a look at https://www.felixcloutier.com/x86/dpps its a bit unclear if intermittent rounding is performed or if this acts as a sort of multiply-add type thing) 
Also note that pre-multiplying `a` and `b` yields better codegen without `-ffast-math` or the like, as seen in the linked collection
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to