avl marked an inline comment as done. avl added inline comments.
================ Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:825 + // The local stack holds all alloca instructions and all byval arguments. + AllocaDerivedValueTracker Tracker; + for (Argument &Arg : F.args()) { ---------------- efriedma wrote: > avl wrote: > > efriedma wrote: > > > avl wrote: > > > > efriedma wrote: > > > > > avl wrote: > > > > > > efriedma wrote: > > > > > > > avl wrote: > > > > > > > > efriedma wrote: > > > > > > > > > avl wrote: > > > > > > > > > > efriedma wrote: > > > > > > > > > > > avl wrote: > > > > > > > > > > > > efriedma wrote: > > > > > > > > > > > > > Do you have to redo the AllocaDerivedValueTracker > > > > > > > > > > > > > analysis? Is it not enough that the call you're > > > > > > > > > > > > > trying to TRE is marked "tail"? > > > > > > > > > > > > >Do you have to redo the AllocaDerivedValueTracker > > > > > > > > > > > > >analysis? > > > > > > > > > > > > > > > > > > > > > > > > AllocaDerivedValueTracker analysis(done in markTails) > > > > > > > > > > > > could be reused here. > > > > > > > > > > > > But marking, done in markTails(), looks like separate > > > > > > > > > > > > tasks. i.e. it is better > > > > > > > > > > > > to make TRE not depending on markTails(). There is a > > > > > > > > > > > > review for this - https://reviews.llvm.org/D60031 > > > > > > > > > > > > Thus such separation looks useful(To not reuse result > > > > > > > > > > > > of markTails but have it computed inplace). > > > > > > > > > > > > > > > > > > > > > > > > > Is it not enough that the call you're trying to TRE > > > > > > > > > > > > > is marked "tail"? > > > > > > > > > > > > > > > > > > > > > > > > It is not enough that call which is subject to TRE is > > > > > > > > > > > > marked "Tail". > > > > > > > > > > > > It also should be checked that other calls does not > > > > > > > > > > > > capture pointer to local stack: > > > > > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > > > > // do not do TRE if any pointer to local stack has > > > > > > > > > > > > escaped. > > > > > > > > > > > > if (!Tracker.EscapePoints.empty()) > > > > > > > > > > > > return false; > > > > > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > It is not enough that call which is subject to TRE is > > > > > > > > > > > > marked "Tail". It also should be checked that other > > > > > > > > > > > > calls does not capture pointer to local stack: > > > > > > > > > > > > > > > > > > > > > > If there's an escaped pointer to the local stack, we > > > > > > > > > > > wouldn't infer "tail" in the first place, would we? > > > > > > > > > > If function receives pointer to alloca then it would not be > > > > > > > > > > marked with "Tail". Then we do not have a possibility to > > > > > > > > > > understand whether this function receives pointer to alloca > > > > > > > > > > but does not capture it: > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > > void test(int recurseCount) > > > > > > > > > > { > > > > > > > > > > if (recurseCount == 0) return; > > > > > > > > > > int temp = 10; > > > > > > > > > > globalIncrement(&temp); > > > > > > > > > > test(recurseCount - 1); > > > > > > > > > > } > > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > test - marked with Tail. > > > > > > > > > > globalIncrement - not marked with Tail. But TRE could be > > > > > > > > > > done since it does not capture pointer. But if it will > > > > > > > > > > capture the pointer then we could not do TRE. So we need to > > > > > > > > > > check !Tracker.EscapePoints.empty(). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > test - marked with Tail. > > > > > > > > > > > > > > > > > > For the given code, TRE won't mark the recursive call "tail". > > > > > > > > > That transform isn't legal: the recursive call could access > > > > > > > > > the caller's version of "temp". > > > > > > > > >For the given code, TRE won't mark the recursive call "tail". > > > > > > > > >That transform isn't legal: the recursive call could access > > > > > > > > >the caller's version of "temp". > > > > > > > > > > > > > > > > it looks like recursive call could NOT access the caller's > > > > > > > > version of "temp": > > > > > > > > > > > > > > > > ``` > > > > > > > > test(recurseCount - 1); > > > > > > > > ``` > > > > > > > > > > > > > > > > Caller`s version of temp is accessed by non-recursive call: > > > > > > > > > > > > > > > > ``` > > > > > > > > globalIncrement(&temp); > > > > > > > > ``` > > > > > > > > > > > > > > > > If globalIncrement does not capture the "&temp" then TRE looks > > > > > > > > to be legal for that case. > > > > > > > > > > > > > > > > globalIncrement() would not be marked with "Tail". test() would > > > > > > > > be marked with Tail. > > > > > > > > > > > > > > > > Thus the pre-requisite for TRE would be: tail-recursive call > > > > > > > > must not receive pointer to local stack(Tail) and non-recursive > > > > > > > > calls must not capture the pointer to local stack. > > > > > > > Can you give a complete IR example where we infer "tail", but TRE > > > > > > > is illegal? > > > > > > > > > > > > > > Can you give a complete IR example, we we don't infer "tail", but > > > > > > > we still do the TRE transform here? > > > > > > >Can you give a complete IR example where we infer "tail", but TRE > > > > > > >is illegal? > > > > > > > > > > > > there is no such example. Currently all cases where we infer > > > > > > "tail" would be valid for TRE. > > > > > > > > > > > > >Can you give a complete IR example, we we don't infer "tail", but > > > > > > >we still do the TRE transform here? > > > > > > > > > > > > For the following example current code base would not infer "tail" > > > > > > for _Z15globalIncrementPKi and as the result would not do TRE for > > > > > > _Z4testi. > > > > > > This patch changes this behavior: so that if _Z15globalIncrementPKi > > > > > > is not marked with "tail" and does not capture its pointer argument > > > > > > - TRE would be allowed for _Z4testi. > > > > > > > > > > > > > > > > > > ``` > > > > > > @count = dso_local local_unnamed_addr global i32 0, align 4 > > > > > > > > > > > > ; Function Attrs: nofree noinline norecurse nounwind uwtable > > > > > > define dso_local void @_Z15globalIncrementPKi(i32* nocapture > > > > > > readonly %param) local_unnamed_addr #0 { > > > > > > entry: > > > > > > %0 = load i32, i32* %param, align 4 > > > > > > %1 = load i32, i32* @count, align 4 > > > > > > %add = add nsw i32 %1, %0 > > > > > > store i32 %add, i32* @count, align 4 > > > > > > ret void > > > > > > } > > > > > > > > > > > > ; Function Attrs: nounwind uwtable > > > > > > define dso_local void @_Z4testi(i32 %recurseCount) > > > > > > local_unnamed_addr #1 { > > > > > > entry: > > > > > > %temp = alloca i32, align 4 > > > > > > %cmp = icmp eq i32 %recurseCount, 0 > > > > > > br i1 %cmp, label %return, label %if.end > > > > > > > > > > > > if.end: ; preds = %entry > > > > > > %0 = bitcast i32* %temp to i8* > > > > > > call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #6 > > > > > > store i32 10, i32* %temp, align 4 > > > > > > call void @_Z15globalIncrementPKi(i32* nonnull %temp) > > > > > > %sub = add nsw i32 %recurseCount, -1 > > > > > > call void @_Z4testi(i32 %sub) > > > > > > call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #6 > > > > > > br label %return > > > > > > > > > > > > return: ; preds = %entry, > > > > > > %if.end > > > > > > ret void > > > > > > } > > > > > > > > > > > > ; Function Attrs: argmemonly nounwind willreturn > > > > > > declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2 > > > > > > > > > > > > ; Function Attrs: argmemonly nounwind willreturn > > > > > > declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2 > > > > > > > > > > > > attributes #0 = { nofree noinline norecurse nounwind uwtable } > > > > > > attributes #1 = { nounwind uwtable } > > > > > > attributes #2 = { argmemonly nounwind willreturn } > > > > > > > > > > > > ``` > > > > > In your example, we don't infer "tail" for globalIncrement... but we > > > > > do infer it for the recursive call to test(). I'm suggesting you > > > > > could just check for "tail" on test(), instead of using > > > > > AllocaDerivedValueTracker. > > > > Checking only test() is not enough. There additionally should be > > > > checked globalIncrement(). > > > > > > > > It is correct that while checking test() there could be checked "Tail" > > > > flag. Which does not require using AllocaDerivedValueTracker. > > > > > > > > But while checking globalIncrement() there should be checked whether > > > > some alloca value escaped or not. "Tail" flag could not be used for > > > > that. AllocaDerivedValueTracker allow to do such check: > > > > > > > > Tracker.EscapePoints.empty() > > > > > > > > If we would not do check for globalIncrement then it is not valid to do > > > > TRE. Thus it seems we need to check globalIncrement for escaping > > > > pointer and we need to use AllocaDerivedValueTracker for that. > > > > Checking only test() is not enough. There additionally should be > > > > checked globalIncrement(). > > > > > > >> Can you give a complete IR example where we infer "tail", but TRE is > > > >> illegal? > > > > there is no such example. Currently all cases where we infer "tail" > > > > would be valid for TRE. > > > > > > I'm not sure how to reconcile this. Are you saying we could infer "tail" > > > in some case where TRE is illegal, but don't right now? Or are you > > > saying that you plan to extend TRE to handle cases where we can't infer > > > "tail" on the recursive call? > > >Or are you saying that you plan to extend TRE to handle cases where we > > >can't infer "tail" on the recursive call? > > > > I think not exactly. more precise would probably be : "I am saying that > > plan to extend TRE to handle cases where we can't infer "tail" on the > > NON-recursive NON-last call" > > > > globalIncrement() is non-recursive non-last call in above example. But we > > need to check whether it captures argument pointer or not to decide whether > > it is OK to do TRE. > > > > To make things clear - I am suggesting instead of current pre-requisite for > > TRE : > > > > "All call sites are marked with Tail" > > > > to make following: > > > > "Recursive last calls are marked with "Tail", non-recursive non-last calls > > are proved to not capture alloca". > > > > For the above example it means : the requirement for test() should stay the > > same(should be marked with Tail). The requirement for globalIncrement() > > should be "does not capture alloca". > > > "Recursive last calls are marked with 'tail'" implies "non-recursive non-last > calls are proved to not capture alloca". I see, thank you for explanations. There is a test which makes me think that above rule is not always correct: Transforms/TailCallElim/basic.ll:@test1 ``` ; PR615. Make sure that we do not move the alloca so that it interferes with the tail call. define i32 @test1() { ; CHECK: i32 @test1() ; CHECK-NEXT: alloca %A = alloca i32 ; <i32*> [#uses=2] store i32 5, i32* %A call void @use(i32* %A) ; CHECK: tail call i32 @test1 %X = tail call i32 @test1() ; <i32> [#uses=1] ret i32 %X } ``` I removed usages of AllocaDerivedValueTracker and corrected the test1 from Transforms/TailCallElim/basic.ll. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82085/new/ https://reviews.llvm.org/D82085 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits