On Fri, Mar 03, 2017 at 09:10:03AM +0100, Uros Bizjak wrote: > > Or are there any define_insn/define_expand where it is desirable to have > > both input and output operand a MEM (and does it have to be matching)? > > For various scalar binary and unary expanders the backend already uses a > > helper > > that will force something into memory if dest and src are both memory and > > not rtx_equal_p, but do we have anything like that in anything these two > > builtin expanders emit? > > Any insn with matched memory (movlps, movhps and similar) can also > operate with matched register. To my knowledge, all insn patterns are > written in this way, since in the past we had plenty of problems with > matched memory operands. > > Also, I really don't remember what benefit brings us to *not* force > input operand to a register at -O0 at expand time and leave to RA to > find matched memory. OTOH, with -O0 we already have so many > unnecessary moves that you get tears in the eyes...
So, I've tried: make mddump for i in `sed -n -e 's/^.*CODE_FOR_\([^,]*\),.*$/\1/p' ../../gcc/config/i386/i386-builtin.def`; do sed -n '/^(define_\(insn\|expand\) ("'$i'"/,/^$/p' tmp-mddump.md; done and looked for "=.*[mo] in there. Some insns might want that && !(MEM_P (operands[0]) && MEM_P (operands[1])), e.g. (define_insn_and_split "avx512f_<castmode><avxsizesuffix>_<castmode>" [(set (match_operand:AVX512MODE2P 0 "nonimmediate_operand" "=x,m") (unspec:AVX512MODE2P [(match_operand:<ssequartermode> 1 "nonimmediate_operand" "xm,x")] UNSPEC_CAST))] "TARGET_AVX512F" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 1))] as the constraint require that both operands aren't memory, shall I create a patch for that? This is the first category below. The second category is where matching operand is ok, so my patch can pessimize stuff. I wonder if we couldn't handle this like: /* If we aren't optimizing, only allow one memory operand to be generated. */ if (memory_operand (op, mode)) { const char *const constraint = insn_data[icode].operand[i + adjust + 1].constraint; if (optimize || num_memory != 1 || !rtx_equal_p (real_target, op)) num_memory++; /* sse2_movsd allows matching operand. */ else if (icode == CODE_FOR_sse2_movsd) ; /* Various masked insns allow matching operand. */ else if (insn_data[icode].operand[i + adjust + 1].predicate == vector_move_operand && (strcmp (constraint, "0C") == 0 || strcmp (constraint, "0C,0") == 0)) ; else num_memory++; } (though perhaps sse2_movsd is still too simplistic, because it has just =m v 0 and =o 0 v alternatives, so if i + adjust + 1 is 2, then it is fine as is, if it is 1, then only if the memory is offsettable; though perhaps LRA can just turn non-offsettable memory into offsettable one through secondary reload). And the last category is with "=m" destination (not allowing anything else) and (match_dup 0) somewhere in the pattern, I believe those have to be expanded specially, because otherwise one e.g. wouldn't get MEM_P target if optimize at all (I think the builtins are supposed to pass pointer to the result in that case) and the match_dup just doesn't appear as another operand. missing !(MEM && MEM) ? ======================= avx512f_pd512_256pd avx512f_pd512_pd avx512f_ps512_256ps avx512f_ps512_ps avx512f_si512_256si avx512f_si512_si avx_pd256_pd avx_ps256_ps avx_si256_si sse_storehps sse_storelps valid matching mem ================== avx512f_ss_truncatev16siv16hi2_mask avx512f_ss_truncatev16siv16qi2_mask avx512f_ss_truncatev8div8hi2_mask avx512f_ss_truncatev8div8si2_mask avx512f_truncatev16siv16hi2_mask avx512f_truncatev16siv16qi2_mask avx512f_truncatev8div8hi2_mask avx512f_truncatev8div8si2_mask avx512f_us_truncatev16siv16hi2_mask avx512f_us_truncatev16siv16qi2_mask avx512f_us_truncatev8div8hi2_mask avx512f_us_truncatev8div8si2_mask avx512f_vcvtps2ph512_mask avx512vl_ss_truncatev16hiv16qi2_mask avx512vl_ss_truncatev4div4si2_mask avx512vl_ss_truncatev8siv8hi2_mask avx512vl_truncatev16hiv16qi2_mask avx512vl_truncatev4div4si2_mask avx512vl_truncatev8siv8hi2_mask avx512vl_us_truncatev16hiv16qi2_mask avx512vl_us_truncatev4div4si2_mask avx512vl_us_truncatev8siv8hi2_mask sse2_movsd vcvtps2ph256_mask match_dup on mem target ======================= avx512f_ss_truncatev8div16qi2_mask_store avx512f_storev16sf_mask avx512f_storev16si_mask avx512f_storev8df_mask avx512f_storev8di_mask avx512f_truncatev8div16qi2_mask_store avx512f_us_truncatev8div16qi2_mask_store avx512vl_compressstorev2df_mask avx512vl_compressstorev2di_mask avx512vl_compressstorev4df_mask avx512vl_compressstorev4di_mask avx512vl_compressstorev4sf_mask avx512vl_compressstorev4si_mask avx512vl_compressstorev8sf_mask avx512vl_compressstorev8si_mask avx512vl_ss_truncatev2div2hi2_mask_store avx512vl_ss_truncatev2div2qi2_mask_store avx512vl_ss_truncatev2div2si2_mask_store avx512vl_ss_truncatev4div4hi2_mask_store avx512vl_ss_truncatev4div4qi2_mask_store avx512vl_ss_truncatev4siv4hi2_mask_store avx512vl_ss_truncatev4siv4qi2_mask_store avx512vl_ss_truncatev8siv8qi2_mask_store avx512vl_storev16hi_mask avx512vl_storev16qi_mask avx512vl_storev2df_mask avx512vl_storev2di_mask avx512vl_storev32qi_mask avx512vl_storev4df_mask avx512vl_storev4di_mask avx512vl_storev4sf_mask avx512vl_storev4si_mask avx512vl_storev8hi_mask avx512vl_storev8sf_mask avx512vl_storev8si_mask avx512vl_truncatev2div2hi2_mask_store avx512vl_truncatev2div2qi2_mask_store avx512vl_truncatev2div2si2_mask_store avx512vl_truncatev4div4hi2_mask_store avx512vl_truncatev4div4qi2_mask_store avx512vl_truncatev4siv4hi2_mask_store avx512vl_truncatev4siv4qi2_mask_store avx512vl_truncatev8siv8qi2_mask_store avx512vl_us_truncatev2div2hi2_mask_store avx512vl_us_truncatev2div2qi2_mask_store avx512vl_us_truncatev2div2si2_mask_store avx512vl_us_truncatev4div4hi2_mask_store avx512vl_us_truncatev4div4qi2_mask_store avx512vl_us_truncatev4siv4hi2_mask_store avx512vl_us_truncatev4siv4qi2_mask_store avx512vl_us_truncatev8siv8qi2_mask_store Jakub