One more thing...

On Mon, Jan 13 2020, Martin Jambor wrote:
> Hi,
>
> On Tue, Jan 07 2020, Richard Biener wrote:
>> On Tue, 17 Dec 2019, Martin Jambor wrote:
>>
>>> Hi,
>>> 
>>> PR 92486 shows that DSE, when seeing a "normal" gimple aggregate
>>> assignment coming from a C struct assignment and one a representing a
>>> folded memcpy, can kill the latter and keep in place only the former,
>>> which does not copy padding - at least when SRA decides to totally
>>> scalarize a least one of the aggregates (when not doing total
>>> scalarization, SRA cares about padding)
>>> 
>>> Mind you, SRA would not totally scalarize an aggregate if it saw that
>>> it takes part in a gimple assignment which is a folded memcpy (see how
>>> type_changing_p is set in contains_vce_or_bfcref_p) but it doesn't
>>> because of the DSE decisions.
>>> 
>>> I was asked to modify SRA to take padding into account - and to copy
>>> it around - when totally scalarizing, which is what the patch below
>>> does.
>>> 
>>> I believe the patch is correct in the sense that it will not cause
>>> miscompilations but after I have seen inlining propagate the useless
>>> (and small and ugly and certainly damaging) accesses far and wide, I
>>> am more convinced that before that this is not the correct approach
>>> and DSE should simple be able to discern between the two assignment
>>> too - and that the semantics of a "normal" gimple assignments should
>>> not include copying of padding.
>>> 
>>> But if the decision will be to make gimple aggregate always a solid
>>> block copy, the patch can do it, and has passed bootstrap and testing
>>> on x86_64-linux and a very similar one on aarch64-linux and
>>> i686-linux.  I suppose that at least the way how it figures out the
>>> type for copying will need change, but even then I'd rather not commit
>>> it.
>>
>> I think both GENERIC and GIMPLE always had assignments being
>> block-copies.  I know we've changed memcpy folding to be more
>> explicit recently to avoid this very same issue but clearly
>> having a GIMPLE stmt like
>>
>>  *p = *q;
>>
>> decomposing into loads/stores of multiple discontiguous memory
>> ranges looks very wrong and would be quite awkward to correctly
>> represent throughout the compiler (and the alias infrastructure).
>
> Well, OK.  I suppose that's another reason for me to start thinking how
> to kill total scalarization.
>
>>
>> So even if the C standard says for aggregates the elements are
>> copied and contents of padding is unspecified the actual GIMPLE
>> primitive should always copy everything unless we can prove the
>> padding is not used.  The frontends could always choose to
>> make not copying the padding explicit (but usually copying it
>> _is_ more efficient unless you add artificial very large one).
>>
>> From an SRA analysis perspective I wonder if you can produce a
>> fix that doesn't depend on the earlier patches in this series?
>> It might be as "simple" as, in completely_scalarize, for
>> FIELD_DECLs with following padding to artificially enlarge
>> the field (I wonder if we can do w/o the 'ref' arg to
>> scalarize_elem for that - we'd build a MEM_REF then?).
>
> Such access would automatically conflict with any real access to the
> field, which would have to be reconciled throughout the pass.  Perhaps
> most importantly, we would have to load/store all of it when replacing a
> scalar copy but create a BIT_FIELD_REF into the replacement when
> replacing a pre-existing access to the field.  But I guess I can give it
> a go.
>
> In any case, unless one can somehow tell just by looking at FILED_DECL
> that it has padding (as opposed to figuring out that the next one simply
> starts later because the next one might not be in the same struct type),
> detecting the situation is actually much easier on top of the other
> patches in the series.
>
> And just to save myself work, unless you outright reject the previous
> two patches, I'd prefer not to attempt this independently.
>

...doing this will make me change the most radical patch in the series
to track pointer to the previous encountered/created access (as opposed
to the pointer to its next pointer) which should however make it more
readable (by getting rid of the triple indirections).

I'll try to come up with a new series quickly (but I'm juggling a few
high-priority tasks at the moment).

Thanks,

Martin

> Thanks,
>
> Martin
>
>>> 2019-12-13  Martin Jambor  <mjam...@suse.cz>
>>> 
>>>     PR tree-optimization/92486
>>>     * tree-sra.c: Include langhooks.h.
>>>     (total_scalarization_fill_padding): New function.
>>>     (total_skip_all_accesses_until_pos): Also create accesses for padding.
>>>     (total_should_skip_creating_access): Pass new parameters to
>>>     total_skip_all_accesses_until_pos, update how much area is already
>>>     covered in cases of success.
>>>     (totally_scalarize_subtree): Track how much of an aggregate is
>>>     covered, create accesses for trailing padding.

Reply via email to