Thanks Indu, and sorry for the late reply; I've been swamped lately.
(I hope this reply goes through - I am not on the GCC mailing list.)

Responding to your questions inline.

Curious, when you say "our implementation", do you mean a coroutine library
> or clang implementation ?


Great question -- I was referring to our library specifically. The compiler
& language sit at a lower level and do not store or manage the
caller/callee information.

Either way, IIUC, how the caller is stored in the Promise object is not
> ABI/language defined.  So, stacktracing through these dynamically allocated
> Promise objects on the heap is not easily doable via a third party
> unwinding application routine oblivious of the running application itself.
>

Yes, exactly.

On that note, just for the sake of a thought experiment: Is (Was) it
> possible (for stack tracing purposes) to assign some skeleton for the
> "linked list data structure" of these dynamically allocated Promise object
> on the heap ? Something like having a Promise object header on the heap
> which specifies the size of Promise obj, location of the next one on heap,
> and offset to the caller IP within the object.
>

It's definitely possible. I believe the only issue is it would introduce
overhead -- at the very least, I imagine, 1 pointer of overhead per
coroutine promise.

(Thinking out loud here and below...)

In the most general case, this overhead feels unavoidable to me (since, in
principle, each promise type can be different), but in many
useful/interesting cases, that is not the case, as many programs don't tend
to mix arbitrary coroutine libraries, especially within a given call chain.
I can imagine the overhead would be acceptable noise for a lot of
applications, at least for heap-allocated promises (which is the most
common usage).
However, I also imagine that resource-constrained applications or
environments may wish to avoid such overhead if possible.
(It feels rather analogous to frame pointers to me in principle; I'm not
sure if the performance characteristics are similar.)

Then again, at least as of current versions of the C++ standard,
applications have little control over the size or contents of the memory
block containing the promise. This is because the compiler ultimately
decides how much memory is necessary for the local variables etc. in a
given frame, and then asks the program to allocate enough memory for the
frame & the promise object itself. The compiler then initializes the frame
information in the memory block, and lets the program manage the portion
corresponding to the promise object itself.

It is also an interesting question whether/when/how it might be possible to
detect if the coroutine promises in a given call chain are homogeneous or
not, and thus whether the 1-pointer overhead could be omitted.
The only information readily available at that point is (a) the return
address of the root of the chain (i.e. the caller of
std::coroutin_handle<>::resume()), and (b) the leaf coroutine (whose
address is on the stack).
Whether the frames in between have similarly-structured promises or not
therefore depends entirely on the particular program's constraints between
those two frames.
One could imagine providing a mechanism to annotate the root frame as "this
entire chain has homogeneous promises", but no such thing exists yet, and I
don't know what the optimal solution here would be.

If the coroutine frames are marked as such in the stacktrace format (as
> Jens suggested on the binutils thread reply), the stacktacer knows where to
> stitch the heap frames without guessing or doing linear memory scans.
>

Yes, I believe so.

On that note, so looks like there can there be a stacktrace of multiple
> interleaved subsets of "normal" and coroutine functions ? I.e.,
>
> normal_B ()
> coroutine_caller_B ()
> coroutine_init_B ()
> normal_server_start ()
> normal_server_init ()
> coroutine_server_caller ()
> std::coroutine_handle::resume()
> normal_event_handler ()
> main ()



If yes, how is this handled in your approach below...
>

It's an excellent question, and yes, this is indeed very possible and legal.
It's simply that we currently don't need (and therefore don't support)
this, which has simplified the problem for our use case.
Should that ever change, we would need to potentially extend our solution
to handle that possibility.
I haven't fully thought it through to be sure of the proper solution at the
moment, but at first glance, I imagine that it's quite similar: we would
identifying each of the coroutine-calling frames, and for each one, trace
their chain in turn, inserting their corresponding frames at those
locations.

Reply via email to