ychen added a comment. In D97915#2860984 <https://reviews.llvm.org/D97915#2860984>, @ChuanqiXu wrote:
> In D97915#2860916 <https://reviews.llvm.org/D97915#2860916>, @ychen wrote: > >> In D97915#2859237 <https://reviews.llvm.org/D97915#2859237>, @ChuanqiXu >> wrote: >> >>> In D97915#2848816 <https://reviews.llvm.org/D97915#2848816>, @ychen wrote: >>> >>>>> Thanks for clarifying. Let's solve the semantics problem first. >>>>> With the introduction about 'raw frame', I think it's necessary to >>>>> introduce this concept in the section 'Switched-Resume Lowering' or even >>>>> the section 'Introduction' in the document. Add a section to tell the >>>>> terminology is satisfied too. >>>> >>>> Done. >>>> >>>>> Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and >>>>> 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same >>>>> value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying >>>>> to solve the problem about memory leak. But I think we could use >>>>> llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame >>>>> (Maybe we need to add an intrinsic `llvm.coro.raw.size`). Then we can >>>>> omit a field in the frame to save space. >>>> >>>> ("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame >>>> address instead of raw frame pointer) >>>> >>>> Apologies for the confusion. I've briefly explained it here >>>> https://reviews.llvm.org/D102145#2752445 I think it is not clear. >>>> "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine >>>> frame field storing the `raw frame pointer`" only after `insertSpills` in >>>> CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an >>>> alloca storing the `raw frame pointer` (try grepping "alloc.frame.ptr" in >>>> this review page). Using "llvm.coro.raw.frame.ptr.offset" instead of >>>> "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please >>>> check line 31. The downside is that the write to coroutine frame is not >>>> through an alloca but a direct write. It is unusual because all fields in >>>> the frame are stored as 1. special/header fields 2. alloca 3. splills. >>>> Doing the write indirectly as Alloca makes me comfortable. The tradeoff is >>>> one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think? >>>> >>>> 19 coro.alloc.align: ; preds = >>>> %coro.alloc.check.align >>>> 20 %3 = sub nsw i64 64, 16 >>>> 21 %4 = add i64 128, %3 >>>> 22 %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13 >>>> 23 %mask = sub i64 64, 1 >>>> 24 %intptr = ptrtoint i8* %call1 to i64 >>>> 25 %over_boundary = add i64 %intptr, %mask >>>> 26 %inverted_mask = xor i64 %mask, -1 >>>> 27 %aligned_intptr = and i64 %over_boundary, %inverted_mask >>>> 28 %diff = sub i64 %aligned_intptr, %intptr >>>> 29 %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff >>>> 30 call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 >>>> 64) ] >>>> 31 store i8* %call1, i8** %alloc.frame.ptr, align 8 >>>> >>>> >>>> ; Replace line 31 with below, and must makes sure line 46~line 48 >>>> is skipped. >>>> ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32() >>>> ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff >>>> ; %addr1 = bitcast i8* %addr to i8** >>>> ; store i8* %call1, i8** %addr1, align 8 >>>> >>>> >>>> 32 br label %coro.init.from.coro.alloc.align >>>> 33 >>>> 34 coro.init.from.coro.alloc.align: ; preds = >>>> %coro.alloc.align >>>> 35 %aligned_result.coro.init = phi i8* [ %aligned_result, >>>> %coro.alloc.align ] >>>> 36 br label %coro.init >>>> 37 >>>> 38 coro.init: ; preds = >>>> %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor >>>> o.init.from.coro.alloc >>>> 39 %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ >>>> %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result >>>> .coro.init, %coro.init.from.coro.alloc.align ] >>>> 40 %FramePtr = bitcast i8* %5 to %f0.Frame* >>>> 41 %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* >>>> %FramePtr, i32 0, i32 0 >>>> 42 store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** >>>> %resume.addr, align 8 >>>> 43 %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void >>>> (%f0.Frame*)* @f0.cleanup >>>> 44 %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* >>>> %FramePtr, i32 0, i32 1 >>>> 45 store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, >>>> align 8 >>>> 46 %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, >>>> i32 2 >>>> 47 %8 = load i8*, i8** %alloc.frame.ptr, align 8 >>>> 48 store i8* %8, i8** %7, align 8 >>>> 49 br label %AllocaSpillBB >>>> 50 >>>> 51 AllocaSpillBB: ; preds = %coro.init >>>> 52 %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* >>>> %FramePtr, i32 0, i32 4 >>>> 53 %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* >>>> %FramePtr, i32 0, i32 5 >>>> 54 %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* >>>> %FramePtr, i32 0, i32 6 >>>> 55 %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, >>>> %f0.Frame* %FramePtr, i32 0, i32 7 >>>> 56 %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, >>>> %f0.Frame* %FramePtr, i32 0, i32 8 >>>> 57 %__promise.reload.addr = getelementptr inbounds %f0.Frame, >>>> %f0.Frame* %FramePtr, i32 0, i32 10 >>>> 58 br label %PostSpill >>>> >>>> >>>> >>>> >>>>> Then I am a little confused for the design again, since we would treat >>>>> the value for CoroBegin as the address of coroutine frame in the past and >>>>> it looks like to be the raw frame now. Let me reconsider if it is OK. >>>> >>>> The returned value of CoroBegin is still coroutine frame not a raw frame >>>> even if the frame is overaligned. You could check the above code. >>> >>> Thanks for clarifying! >>> >>> I don't understand why we need to store the address for coroutine raw frame >>> in the coroutine frame. For example, `%call1` in your example marks the >>> address for the raw frame. Then can we use the value `%call1` in every >>> place where we want to use the address for coroutine frame? >>> If yes, I think we could emit an intrinsic called 'llvm.coro.raw.frame' in >>> the frontend if we need to use the address for the raw frame. Then in the >>> middle end, we could replace `llvm.coro.raw.frame` with `%call1` simply. >>> Similarly, we could define intrinsic `llvm.coro.raw.frame.size`. As far as >>> I know from the codes, the address for the coroutine frame is mainly used >>> for deallocation. So it should be fine I guess. >>> >>> --- >>> >>> Then the code generated now looks roughly like: >>> >>> if (should over align) { >>> /// ... >>> mem = ... >>> } else { >>> /// ... >>> mem = ... >>> } >>> coro.begin(id, mem); >>> >>> It looks redundant since the `then` part and `else` part looks very >>> similar. I understand it would be eliminated in the middle end. But another >>> problem is that the redundant implementation in clang. Maybe we could solve >>> it by refactoring. >>> But I am wondering if it is possible to use another pattern (assume >>> `llvm.coro.alloc` returns true): >>> >>> %raw.frame.ptr = new(call @llvm.coro.raw.frame.size()) >>> %true.frame.ptr = call @llvm.coro.frame(%raw.frame.ptr, NEW_ALIGN) ; we >>> need a better name >>> call @llvm.coro.begin(coro.id, %true.frame.ptr) >>> >>> Then for `llvm.coro.frame`, we could return `@raw.frame.ptr ` simply if the >>> alignment could be satisfied (alignment needed is less than NEW_ALIGN). Or >>> we could do simply to align up for the coroutine frame. There are many APIs >>> in Align.h. >>> And for the destruction, we could emit: >>> >>> call @delete(%raw.frame.ptr, call @llvm.coro.raw.frame.size()) >>> >>> In this way, I guess we would get simpler implementation and generated >>> codes. >>> >>> BTW, if we choose to do so, the semantics for llvm.coro.raw.frame.ptr and >>> llvm.coro.size would change slightly. They would stands for the address and >>> size for the coroutine frame if we don't need over alignment. >>> >>> How do you think about this? >> >> I was confused by this and @rjmccall explained it here >> https://reviews.llvm.org/D97915/new/#2604871. Basically, we could not >> recover "raw frame pointer" (`%call1`) from coroutine frame pointer >> statically at deallocation time. > > Oh, I understand why we need to store the address for raw frame now. Another > question is that how do you think combine the pattern: > > if (should over align) { > /// ... > mem = ... > } else { > /// ... > mem = ... > } > coro.begin(id, mem); > > into this one: > > %true.frame.ptr = call @llvm.coro.create.frame(new(call > @llvm.coro.raw.frame.size()), NEW_ALIGN) ; we need a better name > > > ; It would be lowered to store the address of the raw frame to > the alloca in the middle end if needed > call @llvm.coro.begin(coro.id, %true.frame.ptr) > > and this one: > > call @delete(call @llvm.coro.raw.frame.ptr(), call > @llvm.coro.raw.frame.size()) ; Then > use `llvm.coro.raw.frame.ptr()` and `llvm.coro.raw.frame.size()` directly > whenever we want. > > It looks like we could generate the same code in the front for normal and > over aligned coroutines. Yeah, I think it works for this patch alone. It shifts the semantic lowering from Clang to LLVM but does not perform less work. For future language support like D102147 <https://reviews.llvm.org/D102147>, @llvm.coro.create.frame needs to be repurposed based on the new semantics and that seems a sign that it should be implemented in frontend. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D97915/new/ https://reviews.llvm.org/D97915 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits