store

Richard Biener Thu, 17 Apr 2014 03:23:28 -0700

On Thu, Apr 17, 2014 at 7:19 AM, Thomas Preud'homme
<thomas.preudho...@arm.com> wrote:
>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>>
>> With handling only the outermost handled-component and then only a
>> selected subset you'll catch many but not all cases.  Why not simply
>> use get_inner_reference () here (plus stripping the constant offset
>> from an innermost MEM_REF) and get the best of both worlds (not
>> duplicate parts of its logic and handle more cases)?  Eventually
>> using tree-affine.c and get_inner_reference_aff is even more appropriate
>> so you can compute the address differences without decomposing
>> them yourselves.
>
> Why does the constant offset from an innermost MEM_REF need to be
> stripped? Shouldn't that be part of the offset in the symbolic number?


Yes, but get_inner_reference returns MEM[ptr, constant-offset] as base,
thus it doesn't move the constant offset therein to bitpos and doesn't
return MEM[ptr, 0].  You have to do that yourselves.

(as you are really interested in the _address_ of the memory reference
instead of the reference itself it would be appropriate to introduce a
variant of get_inner_reference that returns 'ptr' in this case and
&x for x.field1 for example)

>>
>> +             /*  Compute address to load from and cast according to the size
>> +                 of the load.  */
>> +             load_ptr_type = build_pointer_type (load_type);
>> +             addr_expr = build1 (ADDR_EXPR, load_ptr_type, bswap_src);
>> +             addr_tmp = make_temp_ssa_name (load_ptr_type, NULL,
>> "load_src");
>> +             addr_stmt = gimple_build_assign_with_ops
>> +                        (NOP_EXPR, addr_tmp, addr_expr, NULL);
>> +             gsi_insert_before (&gsi, addr_stmt, GSI_SAME_STMT);
>> +
>> +             /* Perform the load.  */
>> +             load_offset_ptr = build_int_cst (load_ptr_type, 0);
>> +             val_tmp = make_temp_ssa_name (load_type, NULL, "load_dst");
>> +             val_expr = build2 (MEM_REF, load_type, addr_tmp, 
>> load_offset_ptr);
>> +             load_stmt = gimple_build_assign_with_ops
>> +                        (MEM_REF, val_tmp, val_expr, NULL);
>>
>> this is unnecessarily complex and has TBAA issues.  You don't need to
>> create a "correct" pointer type, so doing
>>
>>     addr_expr = fold_build_addr_expr (bswap_src);
>>
>> is enough.  Now, to fix the TBAA issues you either need to remember
>> and "combine" the reference_alias_ptr_type of each original load and
>> use that for the load_offset_ptr value or decide that isn't worth it and
>> use alias-set zero (use ptr_type_node).
>
> Sorry this is only my second patch [1] to gcc so it's not all clear to me. 
> The TBAA
> issue you mention comes from val_expr referring to a memory area that
> overlap with the smaller memory area used in the bitwise OR operation, am I
> right? Now, I have no idea about how to do the combination of the values
> returned by reference_alias_ptr_type () for each individual small memory
> area. Can you advise me on this? And what are the effect of not doing it and
> using ptr_type_node for the alias-set?

You can "combine" two reference_alias_ptr_type()s with

  if (alias_ptr_types_compatible_p (type1, type2))
    return type1;
  else
    return ptr_type_node;

using ptr_type_node for the alias-set will make it alias with all memory
references (that is, type-based disambiguation will be disabled).  That's
required for example if you combine four loads with type 'short' using
a single load with type 'long'.

> [1] First one was a fix on the existing implementation of the bswap pass.
>
>>
>> Can you also expand the comment about "size vs. range"?  Is it
>> that range can be bigger than size if you have (short)a[0] |
>> ((short)a[3] << 1) sofar
>> where size == 2 but range == 3?  Thus range can also be smaller than size
>> for example for (short)a[0] | ((short)a[0] << 1) where range would be 1 and
>> size == 2?  I suppose adding two examples like this to the comment, together
>> with the expected value of 'n' would help here.
>
> You understood correctly. I will add the suggested example.
>
>> Otherwise the patch looks good.  Now we're only missing the addition
>> of trying to match to a VEC_PERM_EXPR with a constant mask
>> using can_vec_perm_p ;)
>
> Is that the vector shuffle engine you were mentioning in PR54733? If I
> understand correctly it is a generalization of the check again CMPNOP and
> CMPXCHG in find_bswap in this new patchset. I will look if ARM could
> Benefit from this and if yes I might take a look (yep, two conditions).

Yep.  For example it might match on things like

int foo (char *x)
{
   return ((((x[0] << 1 | x[0]) << 1) | x[1]) << 1) | x[0];
}

not sure if target support for shuffles on small vectors (or vector parts)
is working well.  Thus on v1si as in the example.

Richard.

> Thanks a lot for such quick and detailed comments after my ping.
>
> Best regards,
>
> Thomas
>
>
>

Re: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store

Reply via email to