https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122438

Michael Matz <matz at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |matz at gcc dot gnu.org

--- Comment #9 from Michael Matz <matz at gcc dot gnu.org> ---
(In reply to Andreas Schwab from comment #8)
> simplify_subreg changes (reg:SI 1 %d1 [48]) into (reg:DI 0 %d0 [orig:48+-4 ]
> [48]).

And that change is correct on a big endian target.  the lowpart is in d1, the
highpart in d0.  In a way the result is also "correct", the high part (in d0)
could be clobbered in this situation, as it was already determined earlier,
from a much bigger context, that that highpart is irrelevant.

But of course reload cannot know this.  In isolation the instruction is
invalid:
one of the inputs uses (d1/d0), and one of the early-clobbers uses d0.

Problem is: reload has (and as far as I can determine never really had)
no mitigation for this situation.  It starts out with this info:

(insn 37 36 51 6 (parallel [
            (set (reg/v:DI 34 [ nDesc ])
                (plus:DI (reg/v:DI 34 [ nDesc ])
                    (subreg:DI (reg:SI 49) 0)))
            (clobber (scratch:SI))
        ]) "pr122438.i":13:11 discrim 1 155 {adddi3}
     (expr_list:REG_DEAD (reg:SI 49)
        (nil)))

and the info from IRA, that p49 should be placed in d1.  Reload also then knows
the following:

insn=37, live_throughout: 15, dead_or_set: 34, 49

So, it does know that pseudo 49 is touched by this insn in a relevant way.
Then the pseudo-to-hardreg replacement comes:

(insn 37 36 51 6 (parallel [
            (set (reg/v:DI 34 [ nDesc ])
                (plus:DI (reg/v:DI 34 [ nDesc ])
                    (subreg:DI (reg:SI 1 %d1 [49]) 0)))
            (clobber (scratch:SI))
        ]) "pr122438.i":13:11 discrim 1 155 {adddi3}
     (expr_list:REG_DEAD (reg:SI 1 %d1 [49])
        (nil)))

which is all fine and dandy.  Then find_reloads and friends go, determine
that operand3 is the scratch that still needs a reload and selects d0:

Reloads for insn # 37
Reload 0: reload_out (SI) = (scratch:SI)
        DATA_REGS, RELOAD_OTHER (opnum = 3)
        reload_out_reg: (scratch:SI)
        reload_reg_rtx: (reg:SI 0 %d0)

This selection of d0 is the problem.  The normal way this selection
process works is this (find_reg):
 1) collect all hardregs used by pseudos live-throug
 2) collect all hardregs used by pseudos set/dead in the insn
 3) collect all hardregs used by earlier reloads for this insn
these are joined then the cheapest (as per some criteria) hardreg
is selected, preferrably none of 1+2+3, but it may select one 1+2
nevertheless.  In the latter case some of the pseudos once allocated
to the selected hardreg will have to be spilled and the processed of reloading
will have to be redone.

So, in the ideal world this whole selection process sees that pseudo 49
is set/dead in this insn, that it is allocated to d1+d0, and hence either
d0 shouldn't have been used for the scratch, of if it had, then p49 should
have been spilled.

Problem: the various bitsets tracking used/set/dead pseudos only collect
their numbers.  So the routines dealing with them only see their natural
mode, for p49 thats SImode.  So, as far as hardreg selection for reloads
is concerned the operand in which p49 is used only needs d1 (the SImode
part), and hence d0 is considered free and not confliting with p49.

There is code in reload that deals with some paradoxical input operands,
see scan_paradoxical_subregs() and reg_max_ref_mode[], which tracks the
largest mode in which a pseudo is used.  But that one is only there
to allocate stack slots of appropriate size in case those pseudos need
spilling.

One could probably cobble up some code within find_reg()
and count_spilled_pseudo/count_pseudo, that deals with pseudos used
in paradoxical context, but it will be nontrivial: nearly all the code
within reload uses the idiom [basereg, basereg+nregs) for dealing with
multi-reg entities, where basereg usually is just reg_renumber[pseudonum]
(i.e. whatever the regallocator, here IRA, selected as hardreg for the
pseudo in question, here for instance d1 for p49).

But paradoxical subregs on big-endian require the basereg to be adjusted
from reg_renumber (here for it needs to be d0 for the DImode p49 paradox).

AFAICS there never was code dealing with this situation in find_reg
and friends, ergo I don't quite see how paradoxical subregs within normal
operands of random instructions, at least if they need reloads, were ever
handled correctly.  So, they probably shouldn't be generated in the first
place, i.e. ext-dce should probably be dumbed down a bit when reload
is used.

Reply via email to