http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47477
--- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Kai Tietz from comment #17) > What optimization you expect here? I see by the new type-demotion pass some > changes in optimized tree-output: This one is for vectorization, try it with -O3 -mavx2 and look what vectorized loop we get. With type demotion and promotion for the vectorized loops (perhaps only for that and not for the scalar loops), you could get similar vectorization to say: short a[1024], b[1024]; void foo (void) { int i; for (i = 0; i < 1024; i++) { unsigned short c = ((short)(a[i] << 8) >> 8) + 5U; unsigned short d = b[i] + 12U; a[i] = c + d; } } though even in this case I still couldn't achieve the sign extension to be actually performed as 16-bit left + right (signed) shift, while I guess that would lead to even better code. Or look at how we vectorize: short a[1024], b[1024]; void foo (void) { int i; for (i = 0; i < 1024; i++) { unsigned char e = a[i]; short c = e + 5; long long d = (long long) b[i] + 12; a[i] = c + d; } } (note, here forwprop pass already performs type promotion, instead of converting a[i] to unsigned char and back to short, it computes a[i] & 255 in short mode) and how we could instead with type demotions: short a[1024], b[1024]; void foo (void) { int i; for (i = 0; i < 1024; i++) { unsigned short c = (a[i] & 0xff) + 5U; unsigned short d = b[i] + 12U; a[i] = c + d; } } These are all admittedly artificial testcases, but I've seen tons of loops where multiple types were vectorized and I think in some portion of those loops we could either use just a single type size, or at least decrease the number of conversions and different type sizes in the vectorized loops.