https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #21 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #20)
> (In reply to Richard Biener from comment #19)
> > Ah, so the issue is missing -mavx512bw which means we end up with a AVX2
> > style
> > mask for V32QImode.  With -mavx512bw the code vectorizes fine.
> 
> Vectorization code is worse than before, now we need to pack vectorized mask
> which takes extra 3 instructions.

Current ifcvt convert

---------dump of .ch_vect-------
  if (x.1_14 > 255)
    goto <bb 4>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 4> [local count: 477815112]:
  _17 = -_5;
  _18 = _17 >> 31;
  iftmp.0_19 = (unsigned char) _18;
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 477815112]:
  iftmp.0_20 = (unsigned char) _5;

  <bb 6> [local count: 955630225]:
  # iftmp.0_21 = PHI <iftmp.0_19(4), iftmp.0_20(5)>
-------dump end---------


to 
---- dump of .ifcvt---------
  _41 = -x.1_14;
  _17 = (int) _41;
  _18 = _17 >> 31;
  iftmp.0_19 = (unsigned char) _18; -- vec_pack_trunc
  iftmp.0_20 = (unsigned char) _5; -- vec_pack_trunc
  iftmp.0_21 = x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20; -- vec_pack_trunc
  *_6 = iftmp.0_21;
  x_16 = x_24 + 1;
-----dump end----------


if ifcvt output things like
------------optimal .ifcvt------
  _41 = -x.1_14;
  _17 = (int) _41;
  _18 = _17 >> 31;
  iftmp.0_21 = x.1_14 > 255 ? _18 : _5;
  iftmp.0_22 = (unsigned char) iftmp.0_21; --- vec_pack_trunc
  *_6 = iftmp.0_22;
  x_16 = x_24 + 1;
------------end------------

we can save operations for packing mask(3 vec_pack_trunc vs 1 vec_pack_trunc?).

Reply via email to