dmgreen added a comment.

> The total vector bandwidth includes unrolling so currently having 
> `VScaleForTuning=1` and `MaxInterleaveFactor=4` implies 512 tvb.  If the 
> target has >128bit vectors then vector loops will likely have more work than 
> they can handle in parallel but as long as that does not negatively affect 
> register pressure it shouldn't be a problem.

That doesn't fit with my understanding of how VScaleForTuning is currently 
used, and vectorizing/unrolling too far can easily cause the vector part to be 
skipped for many loop counts, falling back to the scalar part. But that all 
sounds fine to me for what this is. Cheers.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112406/new/

https://reviews.llvm.org/D112406

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to