> The major tension we have is that the vectorizer doesn't emit a runtime > iteration check for partial vectorization like RVV's. The assumption is that > length masking is always cheap. That assumption does not hold when the > latency is just dependent on the vector size (=LMUL). Therefore, another > approach could be to either disable partial vectorization altogether (you can > how --param=vect-partial-vector-usage=0 works for you) > if vl_dependent_lmul_scaling == true (very likely too big a hammer) or define > a > new channel to let the vectorizer know we _do_ want a runtime check despite > partial vectorization under specific circumstances. > > Generally, a heuristic that might be reasonable could be "If the loop is > length controlled (with compile-time unknown niters) and latency depends on > LMUL rather than VL, try to be less aggressive on LMUL.". > > The rationale would be something like: "Assuming a standard distribution of > VL around a 'normal' value, small VLs with high LMUL cause disproportionally > high latency that is not amortized by the speedup we get from large VL with > high LMUL." > > That's still pretty shaky and we'd need to see if people consider this > "benchmark hacking" or not. > > Anyway, IMHO you want something like: > if (LOOP_VINFO_FULLY_WITH_LENGTH_P (...) && vl_dependent_lmul_scaling > && niter...) > lmul_factor = scale_lmul (...);
As a follow up from today's meeting. Questions to be resolved still: - Experiment with the above condition and check if they work. If so, let's discuss again next week. - Does LTO help with figuring out the runtime unknown length here? If not, we might need a bug report. - How does --param=vect-partial-vector-usage=0 perform for loops you're interested in? I don't think Richi would like it too much but if we had the ability to determine "partial vectorization yes/no" per mode, rather than per target, we could go for e.g. a SIMD-style main loop and a partially vectorized epilogue (that would be limited to LMUL1). That would be closest to vector loop versioning and likely better than cost scaling. -- Regards Robin
