http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47477

--- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Kai Tietz from comment #17)
> What optimization you expect here?  I see by the new type-demotion pass some
> changes in optimized tree-output:

This one is for vectorization, try it with -O3 -mavx2 and look what vectorized
loop we get.  With type demotion and promotion for the vectorized loops
(perhaps only for that and not for the scalar loops), you could get similar
vectorization to say:
short a[1024], b[1024];

void
foo (void)
{
  int i;
  for (i = 0; i < 1024; i++)
    {
      unsigned short c = ((short)(a[i] << 8) >> 8) + 5U;
      unsigned short d = b[i] + 12U;
      a[i] = c + d;
    }
}
though even in this case I still couldn't achieve the sign extension to be
actually performed as 16-bit left + right (signed) shift, while I guess that
would lead to even better code.
Or look at how we vectorize:
short a[1024], b[1024];

void
foo (void)
{
  int i;
  for (i = 0; i < 1024; i++)
    {
      unsigned char e = a[i];
      short c = e + 5;
      long long d = (long long) b[i] + 12;
      a[i] = c + d;
    }
}
(note, here forwprop pass already performs type promotion, instead of
converting a[i] to unsigned char and back to short, it computes a[i] & 255 in
short mode) and how we could instead with type demotions:
short a[1024], b[1024];

void
foo (void)
{
  int i;
  for (i = 0; i < 1024; i++)
    {
      unsigned short c = (a[i] & 0xff) + 5U;
      unsigned short d = b[i] + 12U;
      a[i] = c + d;
    }
}

These are all admittedly artificial testcases, but I've seen tons of loops
where multiple types were vectorized and I think in some portion of those loops
we could either use just a single type size, or at least decrease the number of
conversions and different type sizes in the vectorized loops.

Reply via email to