Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)

Jakub Jelinek Fri, 03 Mar 2017 02:36:07 -0800

On Fri, Mar 03, 2017 at 09:10:03AM +0100, Uros Bizjak wrote:
> > Or are there any define_insn/define_expand where it is desirable to have
> > both input and output operand a MEM (and does it have to be matching)?
> > For various scalar binary and unary expanders the backend already uses a 
> > helper
> > that will force something into memory if dest and src are both memory and
> > not rtx_equal_p, but do we have anything like that in anything these two
> > builtin expanders emit?
> 
> Any insn with matched memory (movlps, movhps and similar) can also
> operate with matched register. To my knowledge, all insn patterns are
> written in this way, since in the past we had plenty of problems with
> matched memory operands.
> 
> Also, I really don't remember what benefit brings us to *not* force
> input operand to a register at -O0 at expand time and leave to RA to
> find matched memory. OTOH, with -O0 we already have so many
> unnecessary moves that you get tears in the eyes...


So, I've tried:
make mddump
for i in `sed -n -e 's/^.*CODE_FOR_\([^,]*\),.*$/\1/p' 
../../gcc/config/i386/i386-builtin.def`; do sed -n '/^(define_\(insn\|expand\) 
("'$i'"/,/^$/p' tmp-mddump.md; done
and looked for "=.*[mo] in there.
Some insns might want that && !(MEM_P (operands[0]) && MEM_P (operands[1])),
e.g.
(define_insn_and_split "avx512f_<castmode><avxsizesuffix>_<castmode>"
  [(set (match_operand:AVX512MODE2P 0 "nonimmediate_operand" "=x,m")
        (unspec:AVX512MODE2P
          [(match_operand:<ssequartermode> 1 "nonimmediate_operand" "xm,x")]
          UNSPEC_CAST))]
  "TARGET_AVX512F"
  "#"
  "&& reload_completed"
  [(set (match_dup 0) (match_dup 1))]
as the constraint require that both operands aren't memory, shall I create a
patch for that?  This is the first category below.

The second category is where matching operand is ok, so my patch can
pessimize stuff.  I wonder if we couldn't handle this like:
          /* If we aren't optimizing, only allow one memory operand to be
             generated.  */
          if (memory_operand (op, mode))
            {
              const char *const constraint
                = insn_data[icode].operand[i + adjust + 1].constraint;
              if (optimize
                  || num_memory != 1
                  || !rtx_equal_p (real_target, op))
                num_memory++;
              /* sse2_movsd allows matching operand.  */
              else if (icode == CODE_FOR_sse2_movsd)
                ;
              /* Various masked insns allow matching operand.  */
              else if (insn_data[icode].operand[i + adjust + 1].predicate
                       == vector_move_operand
                       && (strcmp (constraint, "0C") == 0
                           || strcmp (constraint, "0C,0") == 0))
                ;
              else
                num_memory++;
            }
(though perhaps sse2_movsd is still too simplistic, because it has just
=m v 0 and =o 0 v alternatives, so if i + adjust + 1 is 2, then it is
fine as is, if it is 1, then only if the memory is offsettable; though
perhaps LRA can just turn non-offsettable memory into offsettable one
through secondary reload).

And the last category is with "=m" destination (not allowing anything else)
and (match_dup 0) somewhere in the pattern, I believe those have to be
expanded specially, because otherwise one e.g. wouldn't get MEM_P target
if optimize at all (I think the builtins are supposed to pass pointer to the
result in that case) and the match_dup just doesn't appear as another operand.

missing !(MEM && MEM) ?
=======================
avx512f_pd512_256pd
avx512f_pd512_pd
avx512f_ps512_256ps
avx512f_ps512_ps
avx512f_si512_256si
avx512f_si512_si
avx_pd256_pd
avx_ps256_ps
avx_si256_si
sse_storehps
sse_storelps

valid matching mem
==================
avx512f_ss_truncatev16siv16hi2_mask
avx512f_ss_truncatev16siv16qi2_mask
avx512f_ss_truncatev8div8hi2_mask
avx512f_ss_truncatev8div8si2_mask
avx512f_truncatev16siv16hi2_mask
avx512f_truncatev16siv16qi2_mask
avx512f_truncatev8div8hi2_mask
avx512f_truncatev8div8si2_mask
avx512f_us_truncatev16siv16hi2_mask
avx512f_us_truncatev16siv16qi2_mask
avx512f_us_truncatev8div8hi2_mask
avx512f_us_truncatev8div8si2_mask
avx512f_vcvtps2ph512_mask
avx512vl_ss_truncatev16hiv16qi2_mask
avx512vl_ss_truncatev4div4si2_mask
avx512vl_ss_truncatev8siv8hi2_mask
avx512vl_truncatev16hiv16qi2_mask
avx512vl_truncatev4div4si2_mask
avx512vl_truncatev8siv8hi2_mask
avx512vl_us_truncatev16hiv16qi2_mask
avx512vl_us_truncatev4div4si2_mask
avx512vl_us_truncatev8siv8hi2_mask
sse2_movsd
vcvtps2ph256_mask

match_dup on mem target
=======================
avx512f_ss_truncatev8div16qi2_mask_store
avx512f_storev16sf_mask
avx512f_storev16si_mask
avx512f_storev8df_mask
avx512f_storev8di_mask
avx512f_truncatev8div16qi2_mask_store
avx512f_us_truncatev8div16qi2_mask_store
avx512vl_compressstorev2df_mask
avx512vl_compressstorev2di_mask
avx512vl_compressstorev4df_mask
avx512vl_compressstorev4di_mask
avx512vl_compressstorev4sf_mask
avx512vl_compressstorev4si_mask
avx512vl_compressstorev8sf_mask
avx512vl_compressstorev8si_mask
avx512vl_ss_truncatev2div2hi2_mask_store
avx512vl_ss_truncatev2div2qi2_mask_store
avx512vl_ss_truncatev2div2si2_mask_store
avx512vl_ss_truncatev4div4hi2_mask_store
avx512vl_ss_truncatev4div4qi2_mask_store
avx512vl_ss_truncatev4siv4hi2_mask_store
avx512vl_ss_truncatev4siv4qi2_mask_store
avx512vl_ss_truncatev8siv8qi2_mask_store
avx512vl_storev16hi_mask
avx512vl_storev16qi_mask
avx512vl_storev2df_mask
avx512vl_storev2di_mask
avx512vl_storev32qi_mask
avx512vl_storev4df_mask
avx512vl_storev4di_mask
avx512vl_storev4sf_mask
avx512vl_storev4si_mask
avx512vl_storev8hi_mask
avx512vl_storev8sf_mask
avx512vl_storev8si_mask
avx512vl_truncatev2div2hi2_mask_store
avx512vl_truncatev2div2qi2_mask_store
avx512vl_truncatev2div2si2_mask_store
avx512vl_truncatev4div4hi2_mask_store
avx512vl_truncatev4div4qi2_mask_store
avx512vl_truncatev4siv4hi2_mask_store
avx512vl_truncatev4siv4qi2_mask_store
avx512vl_truncatev8siv8qi2_mask_store
avx512vl_us_truncatev2div2hi2_mask_store
avx512vl_us_truncatev2div2qi2_mask_store
avx512vl_us_truncatev2div2si2_mask_store
avx512vl_us_truncatev4div4hi2_mask_store
avx512vl_us_truncatev4div4qi2_mask_store
avx512vl_us_truncatev4siv4hi2_mask_store
avx512vl_us_truncatev4siv4qi2_mask_store
avx512vl_us_truncatev8siv8qi2_mask_store

        Jakub

Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)

Reply via email to