On Mon, Aug 22, 2016 at 5:28 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
> On Mon, Aug 22, 2016 at 4:31 AM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Fri, Aug 19, 2016 at 4:45 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>> On Fri, Aug 19, 2016 at 2:21 AM, Richard Biener
>>> <richard.guent...@gmail.com> wrote:
>>>> On Thu, Aug 18, 2016 at 5:16 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>>> On Thu, Aug 18, 2016 at 1:18 AM, Richard Biener
>>>>> <richard.guent...@gmail.com> wrote:
>>>>>> On Wed, Aug 17, 2016 at 10:11 PM, H.J. Lu <hongjiu...@intel.com> wrote:
>>>>>>> builtin_memset_gen_str returns a register used for memset, which only
>>>>>>> supports integer registers.  But a target may use vector registers in
>>>>>>> memmset.  This patch adds a TARGET_GEN_MEMSET_VALUE hook to duplicate
>>>>>>> QImode value to mode derived from STORE_MAX_PIECES, which can be used
>>>>>>> with vector instructions.  The default hook is the same as the original
>>>>>>> builtin_memset_gen_str.  A target can override it to support vector
>>>>>>> instructions for STORE_MAX_PIECES.
>>>>>>>
>>>>>>> Tested on x86-64 and i686.  Any comments?
>>>>>>>
>>>>>>> H.J.
>>>>>>> ---
>>>>>>> gcc/
>>>>>>>
>>>>>>>         * builtins.c (builtin_memset_gen_str): Call 
>>>>>>> targetm.gen_memset_value.
>>>>>>>         (default_gen_memset_value): New function.
>>>>>>>         * target.def (gen_memset_value): New hook.
>>>>>>>         * targhooks.c: Inclue "expmed.h" and "builtins.h".
>>>>>>>         (default_gen_memset_value): New function.
>>>>>>
>>>>>> I see default_gen_memset_value in builtins.c but it belongs here.
>>>>>>
>>>>>>>         * targhooks.h (default_gen_memset_value): New prototype.
>>>>>>>         * config/i386/i386.c (ix86_gen_memset_value): New function.
>>>>>>>         (TARGET_GEN_MEMSET_VALUE): New.
>>>>>>>         * config/i386/i386.h (STORE_MAX_PIECES): Likewise.
>>>>>>>         * doc/tm.texi.in: Add TARGET_GEN_MEMSET_VALUE hook.
>>>>>>>         * doc/tm.texi: Updated.
>>>>>>>
>>>>>
>>>>> Like this?
>>>>
>>>> Aww, ok - I see builtins.c is a better place - sorry for the extra work.
>>>>
>>>> Richard.
>>>
>>> Here is the original patch with the updated ChangeLog.  OK for trunk?
>>
>> So now actually looking at what you are doing.  Why do we need a new
>> hook?  Is it that even if we make builtin_memset_gen_str handle vector-mode
>> MODE properly you don't get the desired optimized code?  Your patch doesn't
>
> It is hard to say what codes builtin_memset_gen_str will generate
> if vector mode is supported.  I suspect since broadcast instructions are
> target specific, a target hook may still be needed to broadcast a QImode
> input to a full vector register.

Well, we mainly seem to miss an optab that matches vec_duplicate (we have
vec_init only which might be sufficient as well).

>> seem to change the mode to vector but only STORE_MAX_PIECES and
>> op_by_pieces_d::run will only consider integer modes.
>
> My patch doesn't change mode of STORE_MAX_PIECES.  It changes
> how TImode register from STORE_MAX_PIECES is generated:
>
> (insn 8 7 9 (set (reg:SI 90)
>         (zero_extend:SI (reg:QI 89))) c1.i:4 -1
>      (nil))
>
> (insn 9 8 10 (parallel [
>             (set (reg:SI 91)
>                 (mult:SI (reg:SI 90)
>                     (const_int 16843009 [0x1010101])))
>             (clobber (reg:CC 17 flags))
>         ]) c1.i:4 -1
>      (nil))
>
> (insn 10 9 11 (set (reg:V4SI 92)
>         (vec_duplicate:V4SI (reg:SI 91))) c1.i:4 4197 {*vec_dupv4si}
>      (nil))
>
> (insn 11 10 12 (set (subreg:V4SI (reg:TI 93) 0)
>         (reg:V4SI 92)) c1.i:4 -1
>      (nil))
>
> (insn 12 11 13 (set (mem:TI (reg/v/f:DI 87 [ dst ]) [0 MEM[(void
> *)dst_2(D)]+0 S16
> A8])
>         (reg:TI 93)) c1.i:4 -1
>      (nil))

Yes, but if you change STORE_MAX_PIECES to also include OI/XImode
then this exposes OI/XImode scalars by doing OI/XImode stores.  Thus
I am suggesting to instead allow to use vector integer modes as well
for the by_pieces stuff (and maybe guard that fact with a target hook).

Richard.

> vs.
>
> (insn 8 7 9 (set (reg:DI 91)
>         (zero_extend:DI (reg:QI 89))) c1.i:4 -1
>      (nil))
>
> (insn 9 8 10 (set (subreg:DI (reg:TI 90) 0)
>         (reg:DI 91)) c1.i:4 -1
>      (nil))
>
> (insn 10 9 11 (set (subreg:DI (reg:TI 90) 8)
>         (const_int 0 [0])) c1.i:4 -1
>      (nil))
>
> (insn 11 10 12 (set (reg:DI 93)
>         (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1
>      (nil))
>
> (insn 12 11 13 (parallel [
>             (set (reg:DI 92)
>                 (mult:DI (subreg:DI (reg:TI 90) 8)
>                     (reg:DI 93)))
>             (clobber (reg:CC 17 flags))
>         ]) c1.i:4 -1
>      (nil))
>
> (insn 13 12 14 (set (reg:DI 95)
>         (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1
>      (nil))
>
> (insn 14 13 15 (parallel [
>             (set (reg:DI 94)
>                 (mult:DI (subreg:DI (reg:TI 90) 0)
>                     (reg:DI 95)))
>             (clobber (reg:CC 17 flags))
>         ]) c1.i:4 -1
>      (nil))
>
> (insn 15 14 16 (parallel [
>             (set (reg:DI 96)
>                 (plus:DI (reg:DI 92)
>                     (reg:DI 94)))
>             (clobber (reg:CC 17 flags))
>         ]) c1.i:4 -1
>      (nil))
>
> (insn 16 15 17 (set (reg:DI 98)
>         (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1
>      (nil))
>
> (insn 17 16 18 (parallel [
>             (set (reg:TI 97)
>                 (mult:TI (zero_extend:TI (subreg:DI (reg:TI 90) 0))
>                     (zero_extend:TI (reg:DI 98))))
>             (clobber (reg:CC 17 flags))
>         ]) c1.i:4 -1
>      (nil))
>
> (insn 18 17 19 (parallel [
>             (set (reg:DI 99)
>                 (plus:DI (reg:DI 96)
>                     (subreg:DI (reg:TI 97) 8)))
>             (clobber (reg:CC 17 flags))
>         ]) c1.i:4 -1
>      (nil))
>
> (insn 19 18 20 (set (subreg:DI (reg:TI 97) 8)
>         (reg:DI 99)) c1.i:4 -1
>      (nil))
>
> (insn 20 19 21 (set (reg:TI 97)
>         (reg:TI 97)) c1.i:4 -1
>      (expr_list:REG_EQUAL (mult:TI (reg:TI 90)
>             (const_wide_int 0x1010101010101010101010101010101))
>         (nil)))
>
> (insn 21 20 22 (set (mem:TI (reg/v/f:DI 87 [ dst ]) [0 MEM[(void
> *)dst_2(D)]+0 S16
> A8])
>         (reg:TI 97)) c1.i:4 -1
>      (nil))
>
>
>> So isn't this a general missed optimization that TImode 0x01010101...01 * x
>> is not optimized to a pxor or broadcast?
>
> In general, vector instructions aren't supported in integer computation.
> TImode 0x01010101...01 * x will use integer registers not  vector registers.
> My target hook simply allows vector registers to be used in TImode for
> STORE_MAX_PIECES.
>
> --
> H.J.

Reply via email to