Hi Iain,

thanks for the info. I have some follow-up questions.

On Jul 12 2022, at 7:11 pm, Iain Sandoe <i...@sandoe.co.uk> wrote:

> Hi Michal,
>  
>> On 12 Jul 2022, at 16:14, Michal Jankovič
>> <michal.jankovi...@gmail.com> wrote:
>  
>> One other related thing I would like to investigate is reducing the
>> number of compiler generated variables in the frame, particularly
>> _Coro_destroy_fn and _Coro_self_handle.   
>>  
>> As I understand it, _Coro_destroy_fn just sets a flag in
>> _Coro_resume_index and calls _Coro_resume_fn; it should be possible to
>> move this logic to __builtin_coro_destroy, so that only _Coro_resume_fn
>> is stored in the frame;
>  
> That is a particular point about GCC’s implementation … (it is not
> neccesarily, or even
> likely to be the same for other implementations) - see below.
>  
> I was intending to do experiment with making the ramp/resume/destroy
> value a parameter
> to the actor function so that we would have something like -
>  
> ramp calls      actor(frame, 0)
> resume calls  actor(frame, 1)
> destroy calls  actor(frame, 2)  
> - the token values are illustrative, not intended to be a final version.
>  
> I think that should allow for more inlining opportunites and possibly
> a way forward to
> frame elision (a.k.a halo).
>  
>> this would however change the coroutine ABI - I don't know if that's
>> a problem.
>  
> The external ABI for the coroutine is the  
> resume,
> destroy pointers  
> and the promise  
> and that one can find each of these from the frame pointer.
>  
> This was agreed between the interested “vendors” so that one compiler
> could invoke
> coroutines built by another.  So I do not think this is so much a
> useful area to explore.
>  

I understand. I still want to try to implement a more light-weight frame
layout with just one function pointer; would it be possible to merge
such a change if it was made opt-in via a compiler flag, eg
`-fsmall-coroutine-frame`? My use-case for this is embedded environments
with very limited memory, and I do not care about interoperability with
other compilers there.  

> Also the intent is that an indirect call through the frame pointer is
> the most frequent
> operation so should be the most efficient.   
>  resume() might be called many times,  
>  destroy() just once thus it is a cold code path  
>  - space can be important too - but interoperability was the goal here.
>  
>> The _Coro_self_handle should be constructible on-demand from the
>> frame address.
>  
> Yes, and in the header the relevant items are all constexpr - so that
> should happen in the
> user’s code.  I elected to have that value in the frame to avoid
> recreating it each time - I
> suppose that is a trade-off of one oiptimisation c.f. another …  

If the handle construction cannot be optimized out, and its thus  
a tradeoff between frame size and number of instructions, then this
could also be enabled by a hypothetical `-fsmall-coroutine-frame`.

Coming back to this:

>>> (the other related optimisation is to eliminate frame entries for
>>> scopes without any suspend
>>> points - which has the potential to save even more space for code with
>>> sparse use of co_xxxx)

This would be nice; although it could encompassed by a more general  
optimization - eliminate frame entries for all variables which are not  
accessed (directly or via pointer / reference) beyond a suspend point.
To be fair, I do not know how to get started on such an optimization,
or if it is even possible to do on the frontend. This would however be
immensely useful for reducing the frame size taken-up by complicated
co_await expressions (among other things), for example, if I have a
composed operation:

co_await when_either(get_leaf_awaitable_1(), get_leaf_awaitable_2());

Right now, this creates space in the frame for the temporary 'leaf'  
awaitables, which were already moved into the composed awaitable.
If the awaitable has an operator co_await that returns the real awaiter,
the original awaitable is also stored in the frame, even if it  
is not referenced by the awaiter; another unused object gets stored if  
the .await_transform() customization point was used.

What are your thoughts on the feasibility / difficulty of implementing
such an optimization?

Michal

>>  
>> Do you have any advice / opinions on this before I try to implement it?
>  
> Hopefully, the notes above help.
>  
> I will rebase my latest code changes as soon as I have a chance and
> put them somewhere
> for you to look at - basically, these are to try and address the
> correctness issues we face,
>  
> Iain
>  
>  
>>  
>> Michal
>>  
>> On Jul 12 2022, at 4:08 pm, Iain Sandoe <i...@sandoe.co.uk> wrote:
>>  
>>> Hi Michal,
>>>  
>>>> On 12 Jul 2022, at 14:35, Michal Jankovič via Gcc-patches
>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>  
>>>> Currently, coroutine frames store all variables of a coroutine separately,
>>>> even if their lifetime does not overlap (they are in distinct
>>>> scopes). This
>>>> patch implements overlapping distinct variable scopes in the
>>>> coroutine frame,
>>>> by storing the frame fields in nested unions of structs. This lowers
>>>> the size
>>>> of the frame for larger coroutines significantly, and makes them
>>>> more usable
>>>> on systems with limited memory.
>>>  
>>> not a review (I will try to take a look at the weekend).
>>>  
>>> but … this is one of the two main optimisations on my TODO - so cool
>>> for doing it.
>>>  
>>> (the other related optimisation is to eliminate frame entries for
>>> scopes without any suspend
>>> points - which has the potential to save even more space for code with
>>> sparse use of co_xxxx)
>>>  
>>> Iain
>>>  
>>>> Bootstrapped and regression tested on x86_64-pc-linux-gnu; new test fails
>>>> before the patch and succeeds after with no regressions.
>>>>  
>>>>    PR c++/105989
>>>>  
>>>> gcc/cp/ChangeLog:
>>>>  
>>>>    * coroutines.cc (struct local_var_info): Add field_access_path.
>>>>    (build_local_var_frame_access_expr): New.
>>>>    (transform_local_var_uses): Use build_local_var_frame_access_expr.
>>>>    (coro_make_frame_entry_id): New.
>>>>    (coro_make_frame_entry): Delegate to coro_make_frame_entry_id.
>>>>    (struct local_vars_frame_data): Add orig, field_access_path.
>>>>    (register_local_var_uses): Generate new frame layout. Create access
>>>>    paths to vars.
>>>>    (morph_fn_to_coro): Set new fields in local_vars_frame_data.   
>>>>  
>>>> gcc/testsuite/ChangeLog:
>>>>  
>>>>    * g++.dg/coroutines/pr105989.C: New test.
>>>>  
>>>> <pr105989.patch>
>>>  
>>>  
>  
>

Reply via email to