On Mon, 5 Mar 2018, Jeff Law wrote:

> >>> Actually, without further conditions I don't see how it would be safe 
> >>> for the successor to have multiple preds.  We might have this 
> >>> situation:
> >>>
> >>> bb1: ret = setjmp
> >>> bb2: x0 = phi <x1 (bb1), foo(bbX)>
> >> No.  Can't happen -- we're still building the initial CFG.  There are no
> >> PHI nodes, there are no SSA_NAMEs.
> > 
> > While that is currently true I think it's short-sighted.  Thinking about 
> > the situation in terms of SSA names and PHI nodes clears up the mind.  In 
> > addition there is already code which builds (sub-)cfgs when SSA form 
> > exists (gimple_find_sub_bbs).  Currently that code can't ever generate 
> > setjmp calls, so it's not an issue.
> It's not clearing up anything for me.  Clearly you're onto something
> that I'm missing, but still trying to figure out.

I'm saying that if there is SSA form then having the second-return edge to 
the successor block if that has multiple predecessors, then this would be 
incorrect.  Do you agree?  I'm also saying that if it's incorrect when in 
SSA form, then it can't be correct (perhaps only very subtly so) without 
SSA form either; I guess that's where we don't agree.

I'm also saying that we currently don't have SSA form while dealing with 
setjmp by more luck and giving up than thought: inlining and va-arg 
expansion do construct sub-cfgs while in SSA.  inlining setjmp calls is 
simply disabled, va-arg expansion doesn't emit setjmp calls into the 
sequence.  But there is nothing inherently magic about SSA form and 
setjmp, so if we're going to fiddle with the CFG form created by setjmp to 
make it more precise, then I think we ought to do it in a way that is 
correct in all intermediate forms, especially at this devel stage, because 
that'd give confidence that it's also correct in the constrained way we're 
needing it.

> Certainly we have to be careful WRT the implicit set of the return value
> of the setjmp call that occurs on the longjmp path.  That's worth
> investigating.  I suspect that works today more by accident of having an
> incorrect CFG than by design.

No, our current imprecise CFG makes it so that the setting of the return 
value is reached by the abnormal edge from the dispatcher, so in a 
data-flow sense it's executed on the first- and second-return case and all 
is well, and so the IL subsumes all the effects that happen in reality 
(and more, which is the reason for the missed optimizations).  You want to 
change the CFG such that it doesn't subsume effects that don't happen in 
reality, and that makes sense, but by that you're making it not reflect 
actions which _do_ happen.

> But it does or at least it should.  It's implicitly set on the longjmp
> side.  If we get this wrong I'd expect we'll see uninit uses in the PHI.
>  That's easy enough to instrument and check for.

Not only uninit uses but conceivably misoptimizations as well.  Let's 
contrieve an example which uses gotos to demonstrate the abnormal edges 
and let's have the second-return edge to land after the setjmp call (and 
setting of return value):

  ret = setjmp(buf);
  if (ret == 0) {             // first return
inside:                       // see below
    call foo(buf)             // might call longjmp
    maybe-goto dispatch       // therefore we have an edge to dispatch
dispatch:                     // our abnormal dispatcher
  goto second_return;         // go to second-return target

Now, as far as data-flow is concerned, as soon as the ret==0 block is 
entered once, ret will never change from 0 again (there simply is no 
visible set which could change it).  So an optimizer would be correct in 
threading the sequence maybe-goto-dispatch->second_return to inside (bear 
with me):

  ret = setjmp();
  if (ret == 0) {             // first return
inside:                       // see below
    call foo()                // might call longjmp
    maybe-goto inside         // if it does we loop, ret must still be zero ?!

This currently would not happen: first we don't thread through abnormal 
edges, and propagation of knowledge over abnormal edges is artificially 
limited as well.  But from a pure data-flow perspective the above 
transformation would be correct, meaning that initial IL can't have been 
correct.  You might say in practice even if the above would happen it 
doesn't matter, because at runtime, with a real longjmp, the "maybe-goto 
inside" would in reality land at the setjmp return again, hence setting 
ret, then the test comes along and all is well.  That would probably be 
true currently, but assume foo always calls longjmp, and GCC can figure 
out it's using the same buffer as the above setjmp, then a further 
optimization might do away with the longjmp and just jump directly to the 
inside label: misoptimization achieved.

None of the above events can happen when the second-return target is 
imprecise and before the setjmp.

Now that all might sound like theoretical games, but I think it highlights 
why we should think long and hard about setjmp CFGs and IL before we relax 

FWIW: I think none of the issues I'm thinking about matter when you check 
two things: 1) that the setjmp call has no LHS, and 2) that the target BB 
has a single predecessor.  I was only triggered by the discussion between 
you and Richi of why (2) might not be important.  But you certainly miss 
checking (1) as well.  IIRC the testcase you were trying to optimize had 
no LHS, right?

It'd be nice if we try to make the setjmp IL correct For Real (tm) in the 
next devel phase:
1) make setjmp have two outgoing edges, always
2) create a setjmp_receive internal function that has a return value
For 'ret = setjmp(buf)', create this CFG:

  ret = setjmp(buf)
   |       \              bb-recv
   |        ----------------\
   |                ret = setjmp_receiver
   |                        /
  normal   /---------------/
  path    /
   |     /

None of these edges would be abnormal.  bb-recv would be the target for 
edges from all calls that might call longjmp(buf).  Those edges might need 
to be abnormal.  As the above CFG reflects all runtime effects precisely, 
but not which instructions are used to achieve them the expansion to RTL 
will be special.


Reply via email to