Hi Robin,
I'd like to explain what's happening with the VLS test expectations 
(e.g., wred-3.c reduc_plus__Float16_float8, f16→f32, trip count=8) after the 
new reduction cost:
Background: The per-type ordered reduction cost raises the vec_to_scalar cost 
for f16→f32 widening reduction from the default 3 to 12.
What happens during VLS mode traversal (V4096QI down to V2QI):
The cost model rejects modes for two different reasons:

 * 
V4096QI ~ V8QI: threshold failure (return 0, hard failure)

 * 
Taking V8QI as example: saving_per_viter = 3×8 - 14 = 10 > 0

 * 
But threshold = 11 > trip_count = 8

 * 
vect_analyze_loop_costing returns 0 → Analysis failed

 * 
V4QI ~ V2QI: never profitable (return -1), exempted by -mmax-vectorization

 * 
Taking V4QI: saving_per_viter = 3×4 - 14 = -2 ≤ 0 → vect_analyze_loop_costing 
returns -1

 * 
-mmax-vectorization sets 
param_vect_allow_possibly_not_worthwhile_vectorizations=1

 * 
At tree-vect-loop.cc:2553: if (res < 0 && !param) 
evaluates to false → skips goto again → Analysis succeeded
Result: The compiler picks V4QI + UF=4 (fully unrolled), producing 2 vfwredosum 
instructions instead of the previous 1 with V8QI.
The other affected cases should be similar. I'd like to confirm: is this the 
expected behavior? Do you have any suggestions on how to update the test 
expectations?
Best regards,
Yaduo

Reply via email to