Hi Robin, I'd like to explain what's happening with the VLS test expectations (e.g., wred-3.c reduc_plus__Float16_float8, f16→f32, trip count=8) after the new reduction cost: Background: The per-type ordered reduction cost raises the vec_to_scalar cost for f16→f32 widening reduction from the default 3 to 12. What happens during VLS mode traversal (V4096QI down to V2QI): The cost model rejects modes for two different reasons:
* V4096QI ~ V8QI: threshold failure (return 0, hard failure) * Taking V8QI as example: saving_per_viter = 3×8 - 14 = 10 > 0 * But threshold = 11 > trip_count = 8 * vect_analyze_loop_costing returns 0 → Analysis failed * V4QI ~ V2QI: never profitable (return -1), exempted by -mmax-vectorization * Taking V4QI: saving_per_viter = 3×4 - 14 = -2 ≤ 0 → vect_analyze_loop_costing returns -1 * -mmax-vectorization sets param_vect_allow_possibly_not_worthwhile_vectorizations=1 * At tree-vect-loop.cc:2553: if (res < 0 && !param) evaluates to false → skips goto again → Analysis succeeded Result: The compiler picks V4QI + UF=4 (fully unrolled), producing 2 vfwredosum instructions instead of the previous 1 with V8QI. The other affected cases should be similar. I'd like to confirm: is this the expected behavior? Do you have any suggestions on how to update the test expectations? Best regards, Yaduo
