dmgreen added a comment. > The total vector bandwidth includes unrolling so currently having > `VScaleForTuning=1` and `MaxInterleaveFactor=4` implies 512 tvb. If the > target has >128bit vectors then vector loops will likely have more work than > they can handle in parallel but as long as that does not negatively affect > register pressure it shouldn't be a problem.
That doesn't fit with my understanding of how VScaleForTuning is currently used, and vectorizing/unrolling too far can easily cause the vector part to be skipped for many loop counts, falling back to the scalar part. But that all sounds fine to me for what this is. Cheers. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D112406/new/ https://reviews.llvm.org/D112406 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits