Thanks Indu, and sorry for the late reply; I've been swamped lately. (I hope this reply goes through - I am not on the GCC mailing list.)
Responding to your questions inline. Curious, when you say "our implementation", do you mean a coroutine library > or clang implementation ? Great question -- I was referring to our library specifically. The compiler & language sit at a lower level and do not store or manage the caller/callee information. Either way, IIUC, how the caller is stored in the Promise object is not > ABI/language defined. So, stacktracing through these dynamically allocated > Promise objects on the heap is not easily doable via a third party > unwinding application routine oblivious of the running application itself. > Yes, exactly. On that note, just for the sake of a thought experiment: Is (Was) it > possible (for stack tracing purposes) to assign some skeleton for the > "linked list data structure" of these dynamically allocated Promise object > on the heap ? Something like having a Promise object header on the heap > which specifies the size of Promise obj, location of the next one on heap, > and offset to the caller IP within the object. > It's definitely possible. I believe the only issue is it would introduce overhead -- at the very least, I imagine, 1 pointer of overhead per coroutine promise. (Thinking out loud here and below...) In the most general case, this overhead feels unavoidable to me (since, in principle, each promise type can be different), but in many useful/interesting cases, that is not the case, as many programs don't tend to mix arbitrary coroutine libraries, especially within a given call chain. I can imagine the overhead would be acceptable noise for a lot of applications, at least for heap-allocated promises (which is the most common usage). However, I also imagine that resource-constrained applications or environments may wish to avoid such overhead if possible. (It feels rather analogous to frame pointers to me in principle; I'm not sure if the performance characteristics are similar.) Then again, at least as of current versions of the C++ standard, applications have little control over the size or contents of the memory block containing the promise. This is because the compiler ultimately decides how much memory is necessary for the local variables etc. in a given frame, and then asks the program to allocate enough memory for the frame & the promise object itself. The compiler then initializes the frame information in the memory block, and lets the program manage the portion corresponding to the promise object itself. It is also an interesting question whether/when/how it might be possible to detect if the coroutine promises in a given call chain are homogeneous or not, and thus whether the 1-pointer overhead could be omitted. The only information readily available at that point is (a) the return address of the root of the chain (i.e. the caller of std::coroutin_handle<>::resume()), and (b) the leaf coroutine (whose address is on the stack). Whether the frames in between have similarly-structured promises or not therefore depends entirely on the particular program's constraints between those two frames. One could imagine providing a mechanism to annotate the root frame as "this entire chain has homogeneous promises", but no such thing exists yet, and I don't know what the optimal solution here would be. If the coroutine frames are marked as such in the stacktrace format (as > Jens suggested on the binutils thread reply), the stacktacer knows where to > stitch the heap frames without guessing or doing linear memory scans. > Yes, I believe so. On that note, so looks like there can there be a stacktrace of multiple > interleaved subsets of "normal" and coroutine functions ? I.e., > > normal_B () > coroutine_caller_B () > coroutine_init_B () > normal_server_start () > normal_server_init () > coroutine_server_caller () > std::coroutine_handle::resume() > normal_event_handler () > main () If yes, how is this handled in your approach below... > It's an excellent question, and yes, this is indeed very possible and legal. It's simply that we currently don't need (and therefore don't support) this, which has simplified the problem for our use case. Should that ever change, we would need to potentially extend our solution to handle that possibility. I haven't fully thought it through to be sure of the proper solution at the moment, but at first glance, I imagine that it's quite similar: we would identifying each of the coroutine-calling frames, and for each one, trace their chain in turn, inserting their corresponding frames at those locations.
