http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-11-10 10:50:54 UTC --- The case I was worried was if we have a single VECTOR_CST before the loop and then create 16 different vectors out of it using different permutations, then perhaps the permutations of the same VECTOR_CST might be cheaper over having to load 10 constants out of memory because the register pressure was too high. But perhaps that is unlikely and we just should fold, if it works in fold-const.c, sure. Doing something about interleaved const store in the vectorizer is desirable anyway, even if we leave the folding to following passes, the fact that we don't need any interleaves means we perhaps might handle more cases and the cost model wouldn't reject it so often.