On Fri, Oct 26, 2018 at 9:19 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > On 10/25/18, Uros Bizjak <ubiz...@gmail.com> wrote: > > On Fri, Oct 26, 2018 at 8:07 AM H.J. Lu <hjl.to...@gmail.com> wrote: > >> > >> Many x86 pmovzx/pmovsx instructions with memory operands are modeled in > >> a wrong way. For example: > >> > >> (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>" > >> [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") > >> (any_extend:V8HI > >> (vec_select:V8QI > >> (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm") > >> (parallel [(const_int 0) (const_int 1) > >> (const_int 2) (const_int 3) > >> (const_int 4) (const_int 5) > >> (const_int 6) (const_int 7)]))))] > >> > >> should be defind for memory operands as: > >> > >> (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>" > >> [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") > >> (any_extend:V8HI > >> (match_operand:V8QI "memory_operand" "m,m,m")))] > >> > >> This set of patches updates them to > >> > >> (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>" > >> [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") > >> (any_extend:V8HI > >> (vec_select:V8QI > >> (match_operand:V16QI 1 "nonimmediate_operand" "Yr,*x,v") > >> (parallel [(const_int 0) (const_int 1) > >> (const_int 2) (const_int 3) > >> (const_int 4) (const_int 5) > >> (const_int 6) (const_int 7)]))))] > >> > >> (define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>_1" > >> [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") > >> (any_extend:V8HI > >> (match_operand:V8QI "subreg_memory_operand" "m,m,m")))] > >> > >> with a splitter: > >> > >> (define_insn_and_split "*sse4_1_<code>v8qiv8hi2<mask_name>_2" > >> [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") > > > > No constraints needed for pre-reload splitter. > > > >> (any_extend:V8HI > >> (vec_select:V8QI > >> (subreg:V16QI > >> (vec_concat:V2DI > >> (match_operand:DI 1 "memory_operand" "m,*m,m") > >> (const_int 0)) 0) > >> (parallel [(const_int 0) (const_int 1) > >> (const_int 2) (const_int 3) > >> (const_int 4) (const_int 5) > >> (const_int 6) (const_int 7)]))))] > >> "TARGET_SSE4_1 && <mask_avx512bw_condition> && > >> <mask_avx512vl_condition>" > >> "#" > >> "&& can_create_pseudo_p ()" > >> [(set (match_dup 0) (match_dup 1))] > > > > [(set (match_dup 0) > > (any_extend:V8HI (match_dup 1)))] > > > >> { > >> operands[1] = gen_rtx_<CODE> (V8HImode, > >> gen_rtx_SUBREG (V8QImode, > >> operands[1], 0)); > >> }) > > > > Don't create subregs of memory. Use adjust_address_nv. > > Here is the updated patch.
> with a splitter: > > (define_insn_and_split "*sse4_1_<code>v8qiv8hi2<mask_name>_2" > [(set (match_operand:V8HI 0 "register_operand") > (any_extend:V8HI > (vec_select:V8QI > (subreg:V16QI > (vec_concat:V2DI > (match_operand:DI 1 "memory_operand") > (const_int 0)) 0) > (parallel [(const_int 0) (const_int 1) > (const_int 2) (const_int 3) > (const_int 4) (const_int 5) > (const_int 6) (const_int 7)]))))] > "TARGET_SSE4_1 && <mask_avx512bw_condition> && <mask_avx512vl_condition>" > "#" > "&& can_create_pseudo_p ()" "can_create_pseudo_p ()" should go to the insn constraint and "&& 1" should be used for split constraint. Both, insn and splitter are valid only before reload. > [(set (match_dup 0) > (any_extend:V8HI (match_dup 1)))] > { > operands[1] = adjust_address_nv (operands[1], V8QImode, 0); > }) Please use double quotes for one-line preparation statement. > (any_extend:V4SI > (match_operand:V4HI 1 "memory_operand" "m,*m,m")))] Please remove star in front of memory constraint. OK with the above changes. Thanks, Uros.