https://github.com/snarkmaster updated https://github.com/llvm/llvm-project/pull/152623
>From 9fc3169ea5f1aea2a88b2616b7c9c4f2949139be Mon Sep 17 00:00:00 2001 From: Alexey <snarkmas...@gmail.com> Date: Thu, 7 Aug 2025 12:10:07 -0700 Subject: [PATCH 1/3] Elide suspension points via [[clang::coro_await_suspend_destroy]] Start by reading the detailed user-facing docs in `AttrDocs.td`. My immediate motivation was that I noticed that short-circuiting coroutines failed to optimize well. Interact with the demo program here: https://godbolt.org/z/E3YK5c45a If Clang on Compiler Explorer supported [[clang::coro_await_suspend_destroy]], the assembly for `simple_coro` would be drastically shorter, and would not contain a call to `operator new`. Here are a few high-level thoughts that don't belong in the docs: - This has `lit` tests, but what gives me real confidence in its correctness is the integration test in `coro_await_suspend_destroy_test.cpp`. This caught all the interesting bugs that I had in earlier revs, and covers equivalence to the standard code path in far more scenarios. - I considered a variety of other designs. Here are some key design points: * I considered optimizing unmodified `await_suspend()` methods, as long as they unconditionally end with an `h.destroy()` call on the current handle, or an exception. However, this would (a) force dynamic dispatch for `destroy` -- bloating IR & reducing optimization opportunities, (b) require far more complex, delicate, and fragile analysis, (c) retain more of the frame setup, so that e.g. `h.done()` works properly. The current solution shortcuts all these concerns. * I want to `Promise&`, rather than `std::coroutine_handle` to `await_suspend_destroy` -- this is safer, simpler, and more efficient. Short-circuiting corotuines should not touch the handle. This decision forces the attribue to go on the class. Resolving a method attribute would have required looking up overloads for both types, and choosing one, which is costly and a bad UX to boot. * `AttrDocs.td` tells portable code to provide a stub `await_suspend()`. This portability / compatibility solution avoids dire issues that would arise if users relied on `__has_cpp_attribute` and the declaration and definition happened to use different toolchains. In particular, it will even be safe for a future compiler release to killswitch this attribute by removing its implementation and setting its version to 0. ``` let Spellings = [Clang<"coro_destroy_after_suspend", /*allowInC*/ 0, /*Version*/ 0>]; ``` - In the docs, I mention the `HasCoroSuspend` path in `CoroEarly.cpp` as a further optimization opportunity. But, I'm sure there are higher-leverage ways of making these non-suspending coros compile better, I just don't know the coro optimization pipeline well enough to flag them. - IIUC the only interaction of this with `coro_only_destroy_when_complete` would be that the compiler expends fewer cycles. - I ran some benchmarks on [folly::result]( https://github.com/facebook/folly/blob/main/folly/result/docs/result.md). Heap allocs are definitely elided, the compiled code looks like a function, not a coroutine, but there's still an optimization gap. On the plus side, this results in a 4x speedup (!) in optimized ASAN builds (numbers not shown for brevity. ``` // Simple result coroutine that adds 1 to the input result<int> result_coro(result<int>&& r) { co_return co_await std::move(r) + 1; } // Non-coroutine equivalent using value_or_throw() result<int> catching_result_func(result<int>&& r) { return result_catch_all([&]() -> result<int> { if (r.has_value()) { return r.value_or_throw() + 1; } return std::move(r).non_value(); }); } // Not QUITE equivalent to the coro -- lacks the exception boundary result<int> non_catching_result_func(result<int>&& r) { if (r.has_value()) { return r.value_or_throw() + 1; } return std::move(r).non_value(); } ============================================================================ [...]lly/result/test/result_coro_bench.cpp relative time/iter iters/s ============================================================================ result_coro_success 13.61ns 73.49M non_catching_result_func_success 3.39ns 295.00M catching_result_func_success 4.41ns 226.88M result_coro_error 19.55ns 51.16M non_catching_result_func_error 9.15ns 109.26M catching_result_func_error 10.19ns 98.10M ============================================================================ [...]lly/result/test/result_coro_bench.cpp relative time/iter iters/s ============================================================================ result_coro_success 10.59ns 94.39M non_catching_result_func_success 3.39ns 295.00M catching_result_func_success 4.07ns 245.81M result_coro_error 13.66ns 73.18M non_catching_result_func_error 9.00ns 111.11M catching_result_func_error 10.04ns 99.63M ``` Demo program from the Compiler Explorer link above: ```cpp #include <coroutine> #include <optional> // Read this LATER -- this implementation detail isn't required to understand // the value of [[clang::coro_await_suspend_destroy]]. // // `optional_wrapper` exists since `get_return_object()` can't return // `std::optional` directly. C++ coroutines have a fundamental timing mismatch // between when the return object is created and when the value is available: // // 1) Early (coroutine startup): `get_return_object()` is called and must return // something immediately. // 2) Later (when `co_return` executes): `return_value(T)` is called with the // actual value. // 3) Issue: If `get_return_object()` returns the storage, it's empty when // returned, and writing to it later cannot affect the already-returned copy. template <typename T> struct optional_wrapper { std::optional<T> storage_; std::optional<T>*& pointer_; optional_wrapper(std::optional<T>*& p) : pointer_(p) { pointer_ = &storage_; } operator std::optional<T>() { return std::move(storage_); } ~optional_wrapper() {} }; // Make `std::optional` a coroutine template <typename T, typename... Args> struct std::coroutine_traits<std::optional<T>, Args...> { struct promise_type { std::optional<T>* storagePtr_ = nullptr; promise_type() = default; ::optional_wrapper<T> get_return_object() { return ::optional_wrapper<T>(storagePtr_); } std::suspend_never initial_suspend() const noexcept { return {}; } std::suspend_never final_suspend() const noexcept { return {}; } void return_value(T&& value) { *storagePtr_ = std::move(value); } void unhandled_exception() { // Leave storage_ empty to represent error } }; }; template <typename T> struct [[clang::coro_await_suspend_destroy]] optional_awaitable { std::optional<T> opt_; bool await_ready() const noexcept { return opt_.has_value(); } T await_resume() { return std::move(opt_).value(); } // Adding `noexcept` here makes the early IR much smaller, but the // optimizer is able to discard the cruft for simpler cases. void await_suspend_destroy(auto& promise) noexcept { // Assume the return object defaults to "empty" } void await_suspend(auto handle) { await_suspend_destroy(handle.promise()); handle.destroy(); } }; template <typename T> optional_awaitable<T> operator co_await(std::optional<T> opt) { return {std::move(opt)}; } // Non-coroutine baseline -- matches the logic of `simple_coro`. std::optional<int> simple_func(const std::optional<int>& r) { try { if (r.has_value()) { return r.value() + 1; } } catch (...) {} return std::nullopt; // return empty on empty input or error } // Without `coro_await_suspend_destroy`, allocates its frame on-heap. std::optional<int> simple_coro(const std::optional<int>& r) { co_return co_await std::move(r) + 4; } // Without `co_await`, this optimizes much like `simple_func`. // Bugs: // - Doesn't short-circuit when `r` is empty, but throws // - Lacks an exception boundary std::optional<int> wrong_simple_coro(const std::optional<int>& r) { co_return r.value() + 2; } int main() { return simple_func(std::optional<int>{32}).value() + simple_coro(std::optional<int>{8}).value() + wrong_simple_coro(std::optional<int>{16}).value(); } ``` Test Plan: For the all-important E2E test, I used this terrible cargo-culted script to run the new end-to-end test with the new compiler. (Yes, I realize I should only need 10% of those `-D` settings for a successful build.) To make sure the test covered what I meant it to do: - I also added an `#error` in the "no attribute" branch to make sure the compiler indeed supports the attribute. - I ran it with a compiler not supporting the attribute, and that also passed. - I also tried `return 1;` from `main()` and saw the logs of the 7 successful tests running. ```sh #!/bin/bash -uex set -o pipefail LLVMBASE=/path/to/source/of/llvm-project SYSCLANG=/path/to/origianl/bin/clang # NB Can add `--debug-output` to debug cmake... # Bootstrap clang -- Use `RelWithDebInfo` or the next phase is too slow! mkdir -p bootstrap cd bootstrap cmake "$LLVMBASE/llvm" \ -G Ninja \ -DBUILD_SHARED_LIBS=true \ -DCMAKE_ASM_COMPILER="$SYSCLANG" \ -DCMAKE_ASM_COMPILER_ID=Clang \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DCMAKE_CXX_COMPILER="$SYSCLANG"++ \ -DCMAKE_C_COMPILER="$SYSCLANG" \ -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-redhat-linux-gnu \ -DLLVM_HOST_TRIPLE=x86_64-redhat-linux-gnu \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_ENABLE_BINDINGS=OFF \ -DLLVM_ENABLE_LLD=ON \ -DLLVM_ENABLE_PROJECTS="clang;lld" \ -DLLVM_OPTIMIZED_TABLEGEN=true \ -DLLVM_FORCE_ENABLE_STATS=ON \ -DLLVM_ENABLE_DUMP=ON \ -DCLANG_DEFAULT_PIE_ON_LINUX=OFF ninja clang lld ninja check-clang-codegencoroutines # Includes the new IR regression tests cd .. NEWCLANG="$PWD"/bootstrap/bin/clang NEWLLD="$PWD"/bootstrap/bin/lld # LIBCXX_INCLUDE_BENCHMARKS=OFF because google-benchmark bugs out cmake "$LLVMBASE/runtimes" \ -G Ninja \ -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-redhat-linux-gnu \ -DLLVM_HOST_TRIPLE=x86_64-redhat-linux-gnu \ -DBUILD_SHARED_LIBS=true \ -DCMAKE_ASM_COMPILER="$NEWCLANG" \ -DCMAKE_ASM_COMPILER_ID=Clang \ -DCMAKE_C_COMPILER="$NEWCLANG" \ -DCMAKE_CXX_COMPILER="$NEWCLANG"++ \ -DLLVM_FORCE_ENABLE_STATS=ON \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_ENABLE_LLD=ON \ -DLIBCXX_INCLUDE_TESTS=ON \ -DLIBCXX_INCLUDE_BENCHMARKS=OFF \ -DLLVM_INCLUDE_TESTS=ON \ -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ninja cxx-test-depends LIBCXXBUILD=$PWD cd "$LLVMBASE" libcxx/utils/libcxx-lit "$LIBCXXBUILD" -v \ libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp ``` --- clang/docs/ReleaseNotes.rst | 6 + clang/include/clang/Basic/Attr.td | 8 + clang/include/clang/Basic/AttrDocs.td | 87 ++++ .../clang/Basic/DiagnosticSemaKinds.td | 3 + clang/lib/CodeGen/CGCoroutine.cpp | 232 +++++++--- clang/lib/Sema/SemaCoroutine.cpp | 102 ++++- .../coro-await-suspend-destroy-errors.cpp | 55 +++ .../coro-await-suspend-destroy.cpp | 129 ++++++ ...a-attribute-supported-attributes-list.test | 1 + .../coro_await_suspend_destroy.pass.cpp | 409 ++++++++++++++++++ 10 files changed, 942 insertions(+), 90 deletions(-) create mode 100644 clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp create mode 100644 clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp create mode 100644 libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index 0e9fcaa5fac6a..41c412730b033 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -136,6 +136,12 @@ Removed Compiler Flags Attribute Changes in Clang -------------------------- +- Introduced a new attribute ``[[clang::coro_await_suspend_destroy]]``. When + applied to a coroutine awaiter class, it causes suspensions into this awaiter + to use a new `await_suspend_destroy(Promise&)` method instead of the standard + `await_suspend(std::coroutine_handle<...>)`. The coroutine is then destroyed. + This improves code speed & size for "short-circuiting" coroutines. + Improvements to Clang's diagnostics ----------------------------------- - Added a separate diagnostic group ``-Wfunction-effect-redeclarations``, for the more pedantic diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td index 30efb9f39e4f4..341848be00e7d 100644 --- a/clang/include/clang/Basic/Attr.td +++ b/clang/include/clang/Basic/Attr.td @@ -1352,6 +1352,14 @@ def CoroAwaitElidableArgument : InheritableAttr { let SimpleHandler = 1; } +def CoroAwaitSuspendDestroy: InheritableAttr { + let Spellings = [Clang<"coro_await_suspend_destroy">]; + let Subjects = SubjectList<[CXXRecord]>; + let LangOpts = [CPlusPlus]; + let Documentation = [CoroAwaitSuspendDestroyDoc]; + let SimpleHandler = 1; +} + // OSObject-based attributes. def OSConsumed : InheritableParamAttr { let Spellings = [Clang<"os_consumed">]; diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td index 2b095ab975202..d2224d86b3900 100644 --- a/clang/include/clang/Basic/AttrDocs.td +++ b/clang/include/clang/Basic/AttrDocs.td @@ -9270,6 +9270,93 @@ Example: }]; } +def CoroAwaitSuspendDestroyDoc : Documentation { + let Category = DocCatDecl; + let Content = [{ + +The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++ +coroutine awaiter type. When this attribute is present, the awaiter must +implement ``void await_suspend_destroy(Promise&)``. If ``await_ready()`` +returns ``false`` at a suspension point, ``await_suspend_destroy`` will be +called directly, bypassing the ``await_suspend(std::coroutine_handle<...>)`` +method. The coroutine being suspended will then be immediately destroyed. + +Logically, the new behavior is equivalent to this standard code: + +.. code-block:: c++ + + void await_suspend_destroy(YourPromise&) { ... } + void await_suspend(auto handle) { + await_suspend_destroy(handle.promise()); + handle.destroy(); + } + +This enables `await_suspend_destroy()` usage in portable awaiters — just add a +stub ``await_suspend()`` as above. Without ``coro_await_suspend_destroy`` +support, the awaiter will behave nearly identically, with the only difference +being heap allocation instead of stack allocation for the coroutine frame. + +This attribute exists to optimize short-circuiting coroutines—coroutines whose +suspend points are either (i) trivial (like ``std::suspend_never``), or (ii) +short-circuiting (like a ``co_await`` that can be expressed in regular control +flow as): + +.. code-block:: c++ + + T val; + if (awaiter.await_ready()) { + val = awaiter.await_resume(); + } else { + awaiter.await_suspend(); + return /* value representing the "execution short-circuited" outcome */; + } + +The benefits of this attribute are: + - **Avoid heap allocations for coro frames**: Allocating short-circuiting + coros on the stack makes code more predictable under memory pressure. + Without this attribute, LLVM cannot elide heap allocation even when all + awaiters are short-circuiting. + - **Performance**: Significantly faster execution and smaller code size. + - **Build time**: Faster compilation due to less IR being generated. + +Marking your ``await_suspend_destroy`` method as ``noexcept`` can sometimes +further improve optimization. + +Here is a toy example of a portable short-circuiting awaiter: + +.. code-block:: c++ + + template <typename T> + struct [[clang::coro_await_suspend_destroy]] optional_awaitable { + std::optional<T> opt_; + bool await_ready() const noexcept { return opt_.has_value(); } + T await_resume() { return std::move(opt_).value(); } + void await_suspend_destroy(auto& promise) { + // Assume the return object of the outer coro defaults to "empty". + } + // Fallback for when `coro_await_suspend_destroy` is unavailable. + void await_suspend(auto handle) { + await_suspend_destroy(handle.promise()); + handle.destroy(); + } + }; + +If all suspension points use (i) trivial or (ii) short-circuiting awaiters, +then the coroutine optimizes more like a plain function, with 2 caveats: + - **Behavior:** The coroutine promise provides an implicit exception boundary + (as if wrapping the function in ``try {} catch { unhandled_exception(); }``). + This exception handling behavior is usually desirable in robust, + return-value-oriented programs that need short-circuiting coroutines. + Otherwise, the promise can always re-throw. + - **Speed:** As of 2025, there is still an optimization gap between a + realistic short-circuiting coro, and the equivalent (but much more verbose) + function. For a guesstimate, expect 4-5ns per call on x86. One idea for + improvement is to also elide trivial suspends like `std::suspend_never`, in + order to hit the `HasCoroSuspend` path in `CoroEarly.cpp`. + +}]; +} + def CountedByDocs : Documentation { let Category = DocCatField; let Content = [{ diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td index 116341f4b66d5..58e7dd7db86d1 100644 --- a/clang/include/clang/Basic/DiagnosticSemaKinds.td +++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td @@ -12504,6 +12504,9 @@ def note_coroutine_promise_call_implicitly_required : Note< def err_await_suspend_invalid_return_type : Error< "return type of 'await_suspend' is required to be 'void' or 'bool' (have %0)" >; +def err_await_suspend_destroy_invalid_return_type : Error< + "return type of 'await_suspend_destroy' is required to be 'void' (have %0)" +>; def note_await_ready_no_bool_conversion : Note< "return type of 'await_ready' is required to be contextually convertible to 'bool'" >; diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp index 827385f9c1a1f..d74bef592aa9c 100644 --- a/clang/lib/CodeGen/CGCoroutine.cpp +++ b/clang/lib/CodeGen/CGCoroutine.cpp @@ -174,6 +174,66 @@ static bool StmtCanThrow(const Stmt *S) { return false; } +// Check if this suspend should be calling `await_suspend_destroy` +static bool useCoroAwaitSuspendDestroy(const CoroutineSuspendExpr &S) { + // This can only be an `await_suspend_destroy` suspend expression if it + // returns void -- `buildCoawaitCalls` in `SemaCoroutine.cpp` asserts this. + // Moreover, when `await_suspend` returns a handle, the outermost method call + // is `.address()` -- making it harder to get the actual class or method. + if (S.getSuspendReturnType() != + CoroutineSuspendExpr::SuspendReturnType::SuspendVoid) { + return false; + } + + // `CGCoroutine.cpp` & `SemaCoroutine.cpp` must agree on whether this suspend + // expression uses `[[clang::coro_await_suspend_destroy]]`. + // + // Any mismatch is a serious bug -- we would either double-free, or fail to + // destroy the promise type. For this reason, we make our decision based on + // the method name, and fatal outside of the happy path -- including on + // failure to find a method name. + // + // As a debug-only check we also try to detect the `AwaiterClass`. This is + // secondary, because detection of the awaiter type can be silently broken by + // small `buildCoawaitCalls` AST changes. + StringRef SuspendMethodName; // Primary + CXXRecordDecl *AwaiterClass = nullptr; // Debug-only, best-effort + if (auto *SuspendCall = + dyn_cast<CallExpr>(S.getSuspendExpr()->IgnoreImplicit())) { + if (auto *SuspendMember = dyn_cast<MemberExpr>(SuspendCall->getCallee())) { + if (auto *BaseExpr = SuspendMember->getBase()) { + // `IgnoreImplicitAsWritten` is critical since `await_suspend...` can be + // invoked on the base of the actual awaiter, and the base need not have + // the attribute. In such cases, the AST will show the true awaiter + // being upcast to the base. + AwaiterClass = BaseExpr->IgnoreImplicitAsWritten() + ->getType() + ->getAsCXXRecordDecl(); + } + if (auto *SuspendMethod = + dyn_cast<CXXMethodDecl>(SuspendMember->getMemberDecl())) { + SuspendMethodName = SuspendMethod->getName(); + } + } + } + if (SuspendMethodName == "await_suspend_destroy") { + assert(!AwaiterClass || + AwaiterClass->hasAttr<CoroAwaitSuspendDestroyAttr>()); + return true; + } else if (SuspendMethodName == "await_suspend") { + assert(!AwaiterClass || + !AwaiterClass->hasAttr<CoroAwaitSuspendDestroyAttr>()); + return false; + } else { + llvm::report_fatal_error( + "Wrong method in [[clang::coro_await_suspend_destroy]] check: " + "expected 'await_suspend' or 'await_suspend_destroy', but got '" + + SuspendMethodName + "'"); + } + + return false; +} + // Emit suspend expression which roughly looks like: // // auto && x = CommonExpr(); @@ -220,6 +280,25 @@ namespace { RValue RV; }; } + +// The simplified `await_suspend_destroy` path avoids suspend intrinsics. +static void emitAwaitSuspendDestroy(CodeGenFunction &CGF, CGCoroData &Coro, + llvm::Function *SuspendWrapper, + llvm::Value *Awaiter, llvm::Value *Frame, + bool AwaitSuspendCanThrow) { + SmallVector<llvm::Value *, 2> DirectCallArgs; + DirectCallArgs.push_back(Awaiter); + DirectCallArgs.push_back(Frame); + + if (AwaitSuspendCanThrow) { + CGF.EmitCallOrInvoke(SuspendWrapper, DirectCallArgs); + } else { + CGF.EmitNounwindRuntimeCall(SuspendWrapper, DirectCallArgs); + } + + CGF.EmitBranchThroughCleanup(Coro.CleanupJD); +} + static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Coro, CoroutineSuspendExpr const &S, AwaitKind Kind, AggValueSlot aggSlot, @@ -234,7 +313,6 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co auto Prefix = buildSuspendPrefixStr(Coro, Kind); BasicBlock *ReadyBlock = CGF.createBasicBlock(Prefix + Twine(".ready")); BasicBlock *SuspendBlock = CGF.createBasicBlock(Prefix + Twine(".suspend")); - BasicBlock *CleanupBlock = CGF.createBasicBlock(Prefix + Twine(".cleanup")); // If expression is ready, no need to suspend. CGF.EmitBranchOnBoolExpr(S.getReadyExpr(), ReadyBlock, SuspendBlock, 0); @@ -243,95 +321,105 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co CGF.EmitBlock(SuspendBlock); auto &Builder = CGF.Builder; - llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save); - auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy); - auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr}); auto SuspendWrapper = CodeGenFunction(CGF.CGM).generateAwaitSuspendWrapper( CGF.CurFn->getName(), Prefix, S); - CGF.CurCoro.InSuspendBlock = true; - assert(CGF.CurCoro.Data && CGF.CurCoro.Data->CoroBegin && "expected to be called in coroutine context"); - SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs; - SuspendIntrinsicCallArgs.push_back( - CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF)); - - SuspendIntrinsicCallArgs.push_back(CGF.CurCoro.Data->CoroBegin); - SuspendIntrinsicCallArgs.push_back(SuspendWrapper); - - const auto SuspendReturnType = S.getSuspendReturnType(); - llvm::Intrinsic::ID AwaitSuspendIID; - - switch (SuspendReturnType) { - case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid: - AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_void; - break; - case CoroutineSuspendExpr::SuspendReturnType::SuspendBool: - AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_bool; - break; - case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle: - AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_handle; - break; - } - - llvm::Function *AwaitSuspendIntrinsic = CGF.CGM.getIntrinsic(AwaitSuspendIID); - // SuspendHandle might throw since it also resumes the returned handle. + const auto SuspendReturnType = S.getSuspendReturnType(); const bool AwaitSuspendCanThrow = SuspendReturnType == CoroutineSuspendExpr::SuspendReturnType::SuspendHandle || StmtCanThrow(S.getSuspendExpr()); - llvm::CallBase *SuspendRet = nullptr; - // FIXME: add call attributes? - if (AwaitSuspendCanThrow) - SuspendRet = - CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendIntrinsicCallArgs); - else - SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic, - SuspendIntrinsicCallArgs); + llvm::Value *Awaiter = + CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF); + llvm::Value *Frame = CGF.CurCoro.Data->CoroBegin; - assert(SuspendRet); - CGF.CurCoro.InSuspendBlock = false; + if (useCoroAwaitSuspendDestroy(S)) { // Call `await_suspend_destroy` & cleanup + emitAwaitSuspendDestroy(CGF, Coro, SuspendWrapper, Awaiter, Frame, + AwaitSuspendCanThrow); + } else { // Normal suspend path -- can actually suspend, uses intrinsics + CGF.CurCoro.InSuspendBlock = true; - switch (SuspendReturnType) { - case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid: - assert(SuspendRet->getType()->isVoidTy()); - break; - case CoroutineSuspendExpr::SuspendReturnType::SuspendBool: { - assert(SuspendRet->getType()->isIntegerTy()); - - // Veto suspension if requested by bool returning await_suspend. - BasicBlock *RealSuspendBlock = - CGF.createBasicBlock(Prefix + Twine(".suspend.bool")); - CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock); - CGF.EmitBlock(RealSuspendBlock); - break; - } - case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle: { - assert(SuspendRet->getType()->isVoidTy()); - break; - } - } + SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs; + SuspendIntrinsicCallArgs.push_back(Awaiter); + SuspendIntrinsicCallArgs.push_back(Frame); + SuspendIntrinsicCallArgs.push_back(SuspendWrapper); + BasicBlock *CleanupBlock = CGF.createBasicBlock(Prefix + Twine(".cleanup")); - // Emit the suspend point. - const bool IsFinalSuspend = (Kind == AwaitKind::Final); - llvm::Function *CoroSuspend = - CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_suspend); - auto *SuspendResult = Builder.CreateCall( - CoroSuspend, {SaveCall, Builder.getInt1(IsFinalSuspend)}); + llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save); + auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy); + auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr}); - // Create a switch capturing three possible continuations. - auto *Switch = Builder.CreateSwitch(SuspendResult, Coro.SuspendBB, 2); - Switch->addCase(Builder.getInt8(0), ReadyBlock); - Switch->addCase(Builder.getInt8(1), CleanupBlock); + llvm::Intrinsic::ID AwaitSuspendIID; - // Emit cleanup for this suspend point. - CGF.EmitBlock(CleanupBlock); - CGF.EmitBranchThroughCleanup(Coro.CleanupJD); + switch (SuspendReturnType) { + case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid: + AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_void; + break; + case CoroutineSuspendExpr::SuspendReturnType::SuspendBool: + AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_bool; + break; + case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle: + AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_handle; + break; + } + + llvm::Function *AwaitSuspendIntrinsic = + CGF.CGM.getIntrinsic(AwaitSuspendIID); + + llvm::CallBase *SuspendRet = nullptr; + // FIXME: add call attributes? + if (AwaitSuspendCanThrow) + SuspendRet = + CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendIntrinsicCallArgs); + else + SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic, + SuspendIntrinsicCallArgs); + + assert(SuspendRet); + CGF.CurCoro.InSuspendBlock = false; + + switch (SuspendReturnType) { + case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid: + assert(SuspendRet->getType()->isVoidTy()); + break; + case CoroutineSuspendExpr::SuspendReturnType::SuspendBool: { + assert(SuspendRet->getType()->isIntegerTy()); + + // Veto suspension if requested by bool returning await_suspend. + BasicBlock *RealSuspendBlock = + CGF.createBasicBlock(Prefix + Twine(".suspend.bool")); + CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock); + CGF.EmitBlock(RealSuspendBlock); + break; + } + case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle: { + assert(SuspendRet->getType()->isVoidTy()); + break; + } + } + + // Emit the suspend point. + const bool IsFinalSuspend = (Kind == AwaitKind::Final); + llvm::Function *CoroSuspend = + CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_suspend); + auto *SuspendResult = Builder.CreateCall( + CoroSuspend, {SaveCall, Builder.getInt1(IsFinalSuspend)}); + + // Create a switch capturing three possible continuations. + auto *Switch = Builder.CreateSwitch(SuspendResult, Coro.SuspendBB, 2); + Switch->addCase(Builder.getInt8(0), ReadyBlock); + Switch->addCase(Builder.getInt8(1), CleanupBlock); + + // Emit cleanup for this suspend point. + CGF.EmitBlock(CleanupBlock); + CGF.EmitBranchThroughCleanup(Coro.CleanupJD); + } // Emit await_resume expression. CGF.EmitBlock(ReadyBlock); diff --git a/clang/lib/Sema/SemaCoroutine.cpp b/clang/lib/Sema/SemaCoroutine.cpp index d193a33f22393..83fe7219c9997 100644 --- a/clang/lib/Sema/SemaCoroutine.cpp +++ b/clang/lib/Sema/SemaCoroutine.cpp @@ -289,6 +289,45 @@ static ExprResult buildCoroutineHandle(Sema &S, QualType PromiseType, return S.BuildCallExpr(nullptr, FromAddr.get(), Loc, FramePtr, Loc); } +// To support [[clang::coro_await_suspend_destroy]], this builds +// *static_cast<Promise*>( +// __builtin_coro_promise(handle, alignof(Promise), false)) +static ExprResult buildPromiseRef(Sema &S, QualType PromiseType, + SourceLocation Loc) { + uint64_t Align = + S.Context.getTypeAlign(PromiseType) / S.Context.getCharWidth(); + + // Build the call to __builtin_coro_promise() + SmallVector<Expr *, 3> Args = { + S.BuildBuiltinCallExpr(Loc, Builtin::BI__builtin_coro_frame, {}), + S.ActOnIntegerConstant(Loc, Align).get(), // alignof(Promise) + S.ActOnCXXBoolLiteral(Loc, tok::kw_false).get()}; // false + ExprResult CoroPromiseCall = + S.BuildBuiltinCallExpr(Loc, Builtin::BI__builtin_coro_promise, Args); + + if (CoroPromiseCall.isInvalid()) + return ExprError(); + + // Cast to Promise* + ExprResult CastExpr = S.ImpCastExprToType( + CoroPromiseCall.get(), S.Context.getPointerType(PromiseType), CK_BitCast); + if (CastExpr.isInvalid()) + return ExprError(); + + // Dereference to get Promise& + return S.CreateBuiltinUnaryOp(Loc, UO_Deref, CastExpr.get()); +} + +static bool hasCoroAwaitSuspendDestroyAttr(Expr *Awaiter) { + QualType AwaiterType = Awaiter->getType(); + if (auto *RD = AwaiterType->getAsCXXRecordDecl()) { + if (RD->hasAttr<CoroAwaitSuspendDestroyAttr>()) { + return true; + } + } + return false; +} + struct ReadySuspendResumeResult { enum AwaitCallType { ACT_Ready, ACT_Suspend, ACT_Resume }; Expr *Results[3]; @@ -399,15 +438,30 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise, Calls.Results[ACT::ACT_Ready] = S.MaybeCreateExprWithCleanups(Conv.get()); } - ExprResult CoroHandleRes = - buildCoroutineHandle(S, CoroPromise->getType(), Loc); - if (CoroHandleRes.isInvalid()) { - Calls.IsInvalid = true; - return Calls; + // For awaiters with `[[clang::coro_await_suspend_destroy]]`, we call + // `void await_suspend_destroy(Promise&)` & promptly destroy the coro. + CallExpr *AwaitSuspend = nullptr; + bool UseAwaitSuspendDestroy = hasCoroAwaitSuspendDestroyAttr(Operand); + if (UseAwaitSuspendDestroy) { + ExprResult PromiseRefRes = buildPromiseRef(S, CoroPromise->getType(), Loc); + if (PromiseRefRes.isInvalid()) { + Calls.IsInvalid = true; + return Calls; + } + Expr *PromiseRef = PromiseRefRes.get(); + AwaitSuspend = cast_or_null<CallExpr>( + BuildSubExpr(ACT::ACT_Suspend, "await_suspend_destroy", PromiseRef)); + } else { // The standard `await_suspend(std::coroutine_handle<...>)` + ExprResult CoroHandleRes = + buildCoroutineHandle(S, CoroPromise->getType(), Loc); + if (CoroHandleRes.isInvalid()) { + Calls.IsInvalid = true; + return Calls; + } + Expr *CoroHandle = CoroHandleRes.get(); + AwaitSuspend = cast_or_null<CallExpr>( + BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle)); } - Expr *CoroHandle = CoroHandleRes.get(); - CallExpr *AwaitSuspend = cast_or_null<CallExpr>( - BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle)); if (!AwaitSuspend) return Calls; if (!AwaitSuspend->getType()->isDependentType()) { @@ -417,25 +471,37 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise, // type Z. QualType RetType = AwaitSuspend->getCallReturnType(S.Context); - // Support for coroutine_handle returning await_suspend. - if (Expr *TailCallSuspend = - maybeTailCall(S, RetType, AwaitSuspend, Loc)) + auto EmitAwaitSuspendDiag = [&](unsigned int DiagCode) { + S.Diag(AwaitSuspend->getCalleeDecl()->getLocation(), DiagCode) << RetType; + S.Diag(Loc, diag::note_coroutine_promise_call_implicitly_required) + << AwaitSuspend->getDirectCallee(); + Calls.IsInvalid = true; + }; + + // `await_suspend_destroy` must return `void` -- and `CGCoroutine.cpp` + // critically depends on this in `hasCoroAwaitSuspendDestroyAttr`. + if (UseAwaitSuspendDestroy) { + if (RetType->isVoidType()) { + Calls.Results[ACT::ACT_Suspend] = + S.MaybeCreateExprWithCleanups(AwaitSuspend); + } else { + EmitAwaitSuspendDiag( + diag::err_await_suspend_destroy_invalid_return_type); + } + // Support for coroutine_handle returning await_suspend. + } else if (Expr *TailCallSuspend = + maybeTailCall(S, RetType, AwaitSuspend, Loc)) { // Note that we don't wrap the expression with ExprWithCleanups here // because that might interfere with tailcall contract (e.g. inserting // clean up instructions in-between tailcall and return). Instead // ExprWithCleanups is wrapped within maybeTailCall() prior to the resume // call. Calls.Results[ACT::ACT_Suspend] = TailCallSuspend; - else { + } else { // non-class prvalues always have cv-unqualified types if (RetType->isReferenceType() || (!RetType->isBooleanType() && !RetType->isVoidType())) { - S.Diag(AwaitSuspend->getCalleeDecl()->getLocation(), - diag::err_await_suspend_invalid_return_type) - << RetType; - S.Diag(Loc, diag::note_coroutine_promise_call_implicitly_required) - << AwaitSuspend->getDirectCallee(); - Calls.IsInvalid = true; + EmitAwaitSuspendDiag(diag::err_await_suspend_invalid_return_type); } else Calls.Results[ACT::ACT_Suspend] = S.MaybeCreateExprWithCleanups(AwaitSuspend); diff --git a/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp new file mode 100644 index 0000000000000..6a082c15f2581 --- /dev/null +++ b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp @@ -0,0 +1,55 @@ +// RUN: %clang_cc1 -std=c++20 -verify %s + +#include "Inputs/coroutine.h" + +// Coroutine type with `std::suspend_never` for initial/final suspend +struct Task { + struct promise_type { + Task get_return_object() { return {}; } + std::suspend_never initial_suspend() { return {}; } + std::suspend_never final_suspend() noexcept { return {}; } + void return_void() {} + void unhandled_exception() {} + }; +}; + +struct [[clang::coro_await_suspend_destroy]] WrongReturnTypeAwaitable { + bool await_ready() { return false; } + bool await_suspend_destroy(auto& promise) { return true; } // expected-error {{return type of 'await_suspend_destroy' is required to be 'void' (have 'bool')}} + void await_suspend(auto handle) { + await_suspend_destroy(handle.promise()); + handle.destroy(); + } + void await_resume() {} +}; + +Task test_invalid_destroying_await() { + co_await WrongReturnTypeAwaitable{}; // expected-note {{call to 'await_suspend_destroy<Task::promise_type>' implicitly required by coroutine function here}} +} + +struct [[clang::coro_await_suspend_destroy]] MissingMethodAwaitable { + bool await_ready() { return false; } + // Missing await_suspend_destroy method + void await_suspend(auto handle) { + handle.destroy(); + } + void await_resume() {} +}; + +Task test_missing_method() { + co_await MissingMethodAwaitable{}; // expected-error {{no member named 'await_suspend_destroy' in 'MissingMethodAwaitable'}} +} + +struct [[clang::coro_await_suspend_destroy]] WrongParameterTypeAwaitable { + bool await_ready() { return false; } + void await_suspend_destroy(int x) {} // expected-note {{passing argument to parameter 'x' here}} + void await_suspend(auto handle) { + await_suspend_destroy(handle.promise()); + handle.destroy(); + } + void await_resume() {} +}; + +Task test_wrong_parameter_type() { + co_await WrongParameterTypeAwaitable{}; // expected-error {{no viable conversion from 'std::coroutine_traits<Task>::promise_type' (aka 'Task::promise_type') to 'int'}} +} diff --git a/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp new file mode 100644 index 0000000000000..fa1dbf475e56c --- /dev/null +++ b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp @@ -0,0 +1,129 @@ +// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \ +// RUN: -disable-llvm-passes | FileCheck %s --check-prefix=CHECK-INITIAL +// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \ +// RUN: -O2 | FileCheck %s --check-prefix=CHECK-OPTIMIZED + +#include "Inputs/coroutine.h" + +// Awaitable with `coro_await_suspend_destroy` attribute +struct [[clang::coro_await_suspend_destroy]] DestroyingAwaitable { + bool await_ready() { return false; } + void await_suspend_destroy(auto& promise) {} + void await_suspend(auto handle) { + await_suspend_destroy(handle.promise()); + handle.destroy(); + } + void await_resume() {} +}; + +// Awaitable without `coro_await_suspend_destroy` (normal behavior) +struct NormalAwaitable { + bool await_ready() { return false; } + void await_suspend(std::coroutine_handle<> h) {} + void await_resume() {} +}; + +// Coroutine type with `std::suspend_never` for initial/final suspend +struct Task { + struct promise_type { + Task get_return_object() { return {}; } + std::suspend_never initial_suspend() { return {}; } + std::suspend_never final_suspend() noexcept { return {}; } + void return_void() {} + void unhandled_exception() {} + }; +}; + +// Single co_await with coro_await_suspend_destroy. +// Should result in no allocation after optimization. +Task test_single_destroying_await() { + co_await DestroyingAwaitable{}; +} + +// CHECK-INITIAL-LABEL: define{{.*}} void @_Z28test_single_destroying_awaitv +// CHECK-INITIAL: call{{.*}} @llvm.coro.alloc +// CHECK-INITIAL: call{{.*}} @llvm.coro.begin + +// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z28test_single_destroying_awaitv +// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc +// CHECK-OPTIMIZED-NOT: call{{.*}} malloc +// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm + +// Test multiple `co_await`s, all with `coro_await_suspend_destroy`. +// This should also result in no allocation after optimization. +Task test_multiple_destroying_awaits(bool condition) { + co_await DestroyingAwaitable{}; + co_await DestroyingAwaitable{}; + if (condition) { + co_await DestroyingAwaitable{}; + } +} + +// CHECK-INITIAL-LABEL: define{{.*}} void @_Z31test_multiple_destroying_awaitsb +// CHECK-INITIAL: call{{.*}} @llvm.coro.alloc +// CHECK-INITIAL: call{{.*}} @llvm.coro.begin + +// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z31test_multiple_destroying_awaitsb +// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc +// CHECK-OPTIMIZED-NOT: call{{.*}} malloc +// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm + +// Mixed awaits - some with `coro_await_suspend_destroy`, some without. +// We should still see allocation because not all awaits destroy the coroutine. +Task test_mixed_awaits() { + co_await NormalAwaitable{}; // Must precede "destroy" to be reachable + co_await DestroyingAwaitable{}; +} + +// CHECK-INITIAL-LABEL: define{{.*}} void @_Z17test_mixed_awaitsv +// CHECK-INITIAL: call{{.*}} @llvm.coro.alloc +// CHECK-INITIAL: call{{.*}} @llvm.coro.begin + +// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z17test_mixed_awaitsv +// CHECK-OPTIMIZED: call{{.*}} @_Znwm + + +// Check the attribute detection affects control flow. +Task test_attribute_detection() { + co_await DestroyingAwaitable{}; + // Unreachable in OPTIMIZED, so those builds don't see an allocation. + co_await NormalAwaitable{}; +} + +// Check that we skip the normal suspend intrinsic and go directly to cleanup. +// +// CHECK-INITIAL-LABEL: define{{.*}} void @_Z24test_attribute_detectionv +// CHECK-INITIAL: call{{.*}} @_Z24test_attribute_detectionv.__await_suspend_wrapper__await +// CHECK-INITIAL-NEXT: br label %cleanup5 +// CHECK-INITIAL-NOT: call{{.*}} @llvm.coro.suspend +// CHECK-INITIAL: call{{.*}} @_Z24test_attribute_detectionv.__await_suspend_wrapper__await +// CHECK-INITIAL: call{{.*}} @llvm.coro.suspend +// CHECK-INITIAL: call{{.*}} @_Z24test_attribute_detectionv.__await_suspend_wrapper__final + +// Since `co_await DestroyingAwaitable{}` gets converted into an unconditional +// branch, the `co_await NormalAwaitable{}` is unreachable in optimized builds. +// +// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc +// CHECK-OPTIMIZED-NOT: call{{.*}} malloc +// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm + +// Template awaitable with `coro_await_suspend_destroy` attribute +template<typename T> +struct [[clang::coro_await_suspend_destroy]] TemplateDestroyingAwaitable { + bool await_ready() { return false; } + void await_suspend_destroy(auto& promise) {} + void await_suspend(auto handle) { + await_suspend_destroy(handle.promise()); + handle.destroy(); + } + void await_resume() {} +}; + +Task test_template_destroying_await() { + co_await TemplateDestroyingAwaitable<int>{}; +} + +// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z30test_template_destroying_awaitv +// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc +// CHECK-OPTIMIZED-NOT: call{{.*}} malloc +// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test index 05693538252aa..43327744ffc8a 100644 --- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test +++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test @@ -62,6 +62,7 @@ // CHECK-NEXT: Convergent (SubjectMatchRule_function) // CHECK-NEXT: CoroAwaitElidable (SubjectMatchRule_record) // CHECK-NEXT: CoroAwaitElidableArgument (SubjectMatchRule_variable_is_parameter) +// CHECK-NEXT: CoroAwaitSuspendDestroy (SubjectMatchRule_record) // CHECK-NEXT: CoroDisableLifetimeBound (SubjectMatchRule_function) // CHECK-NEXT: CoroLifetimeBound (SubjectMatchRule_record) // CHECK-NEXT: CoroOnlyDestroyWhenComplete (SubjectMatchRule_record) diff --git a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp new file mode 100644 index 0000000000000..1b48b1523bf12 --- /dev/null +++ b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp @@ -0,0 +1,409 @@ +//===-- Integration test for `clang::co_await_suspend_destroy` ------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +// Test for the `coro_await_suspend_destroy` attribute and +// `await_suspend_destroy` method. +// +// Per `AttrDocs.td`, using `coro_await_suspend_destroy` with +// `await_suspend_destroy` should be equivalent to providing a stub +// `await_suspend` that calls `await_suspend_destroy` and then destroys the +// coroutine handle. +// +// This test logs control flow in a variety of scenarios (controlled by +// `test_toggles`), and checks that the execution traces are identical for +// awaiters with/without the attribute. We currently test all combinations of +// error injection points to ensure behavioral equivalence. +// +// In contrast to Clang `lit` tests, this makes it easy to verify non-divergence +// of functional behavior of the entire coroutine across many scenarios, +// including exception handling, early returns, and mixed usage with legacy +// awaitables. +// +//===----------------------------------------------------------------------===// + +// UNSUPPORTED: c++03, c++11, c++14, c++17 + +#if __has_cpp_attribute(clang::coro_await_suspend_destroy) +# define ATTR_CORO_AWAIT_SUSPEND_DESTROY [[clang::coro_await_suspend_destroy]] +#else +# define ATTR_CORO_AWAIT_SUSPEND_DESTROY +#endif + +#include <cassert> +#include <coroutine> +#include <exception> +#include <iostream> +#include <memory> +#include <optional> +#include <string> + +struct my_err : std::exception {}; + +enum test_toggles { + throw_in_convert_optional_wrapper = 0, + throw_in_return_value, + throw_in_await_resume, + throw_in_await_suspend_destroy, + dynamic_short_circuit, // Does not apply to `..._shortcircuits_to_empty` tests + largest = dynamic_short_circuit // for array in `test_driver` +}; + +enum test_event { + unset = 0, + // Besides events, we also log various integers between 1 and 9999 that + // disambiguate different awaiters, or represent different return values. + convert_optional_wrapper = 10000, + destroy_return_object, + destroy_promise, + get_return_object, + initial_suspend, + final_suspend, + return_value, + throw_return_value, + unhandled_exception, + await_ready, + await_resume, + destroy_optional_awaitable, + throw_await_resume, + await_suspend_destroy, + throw_await_suspend_destroy, + await_suspend, + coro_catch, + throw_convert_optional_wrapper, +}; + +struct test_driver { + static constexpr int max_events = 1000; + + bool toggles_[test_toggles::largest + 1] = {}; + int events_[max_events] = {}; + int cur_event_ = 0; + + bool toggles(test_toggles toggle) const { return toggles_[toggle]; } + void log(auto&&... events) { + for (auto event : {static_cast<int>(events)...}) { + assert(cur_event_ < max_events); + events_[cur_event_++] = event; + } + } +}; + +// `optional_wrapper` exists since `get_return_object()` can't return +// `std::optional` directly. C++ coroutines have a fundamental timing mismatch +// between when the return object is created and when the value is available: +// +// 1) Early (coroutine startup): `get_return_object()` is called and must return +// something immediately. +// 2) Later (when `co_return` executes): `return_value(T)` is called with the +// actual value. +// 3) Issue: If `get_return_object()` returns the storage, it's empty when +// returned, and writing to it later cannot affect the already-returned copy. +template <typename T> +struct optional_wrapper { + test_driver& driver_; + std::optional<T> storage_; + std::optional<T>*& pointer_; + optional_wrapper(test_driver& driver, std::optional<T>*& p) : driver_(driver), pointer_(p) { pointer_ = &storage_; } + operator std::optional<T>() { + if (driver_.toggles(test_toggles::throw_in_convert_optional_wrapper)) { + driver_.log(test_event::throw_convert_optional_wrapper); + throw my_err(); + } + driver_.log(test_event::convert_optional_wrapper); + return std::move(storage_); + } + ~optional_wrapper() { driver_.log(test_event::destroy_return_object); } +}; + +// Make `std::optional` a coroutine +template <typename T, typename... Args> +struct std::coroutine_traits<std::optional<T>, test_driver&, Args...> { + struct promise_type { + std::optional<T>* storagePtr_ = nullptr; + test_driver& driver_; + + promise_type(test_driver& driver, auto&&...) : driver_(driver) {} + ~promise_type() { driver_.log(test_event::destroy_promise); } + optional_wrapper<T> get_return_object() { + driver_.log(test_event::get_return_object); + return optional_wrapper<T>(driver_, storagePtr_); + } + std::suspend_never initial_suspend() const noexcept { + driver_.log(test_event::initial_suspend); + return {}; + } + std::suspend_never final_suspend() const noexcept { + driver_.log(test_event::final_suspend); + return {}; + } + void return_value(T value) { + driver_.log(test_event::return_value, value); + if (driver_.toggles(test_toggles::throw_in_return_value)) { + driver_.log(test_event::throw_return_value); + throw my_err(); + } + *storagePtr_ = std::move(value); + } + void unhandled_exception() { + // Leave `*storagePtr_` empty to represent error + driver_.log(test_event::unhandled_exception); + } + }; +}; + +template <typename T, bool HasAttr> +struct base_optional_awaitable { + test_driver& driver_; + int id_; + std::optional<T> opt_; + + ~base_optional_awaitable() { driver_.log(test_event::destroy_optional_awaitable, id_); } + + bool await_ready() const noexcept { + driver_.log(test_event::await_ready, id_); + return opt_.has_value(); + } + T await_resume() { + if (driver_.toggles(test_toggles::throw_in_await_resume)) { + driver_.log(test_event::throw_await_resume, id_); + throw my_err(); + } + driver_.log(test_event::await_resume, id_); + return std::move(opt_).value(); + } + void await_suspend_destroy(auto& promise) { +#if __has_cpp_attribute(clang::coro_await_suspend_destroy) + if constexpr (HasAttr) { + // This is just here so that old & new events compare exactly equal. + driver_.log(test_event::await_suspend); + } +#endif + assert(promise.storagePtr_); + if (driver_.toggles(test_toggles::throw_in_await_suspend_destroy)) { + driver_.log(test_event::throw_await_suspend_destroy, id_); + throw my_err(); + } + driver_.log(test_event::await_suspend_destroy, id_); + } + void await_suspend(auto handle) { + driver_.log(test_event::await_suspend); + await_suspend_destroy(handle.promise()); + handle.destroy(); + } +}; + +template <typename T> +struct old_optional_awaitable : base_optional_awaitable<T, false> {}; + +template <typename T> +struct ATTR_CORO_AWAIT_SUSPEND_DESTROY new_optional_awaitable : base_optional_awaitable<T, true> {}; + +void enumerate_toggles(auto lambda) { + // Generate all combinations of toggle values + for (int mask = 0; mask <= (1 << (test_toggles::largest + 1)) - 1; ++mask) { + test_driver driver; + for (int i = 0; i <= test_toggles::largest; ++i) { + driver.toggles_[i] = (mask & (1 << i)) != 0; + } + lambda(driver); + } +} + +template <typename T> +void check_coro_with_driver_for(auto coro_fn) { + enumerate_toggles([&](const test_driver& driver) { + auto old_driver = driver; + std::optional<T> old_res; + bool old_threw = false; + try { + old_res = coro_fn.template operator()<old_optional_awaitable<T>, T>(old_driver); + } catch (const my_err&) { + old_threw = true; + } + auto new_driver = driver; + std::optional<T> new_res; + bool new_threw = false; + try { + new_res = coro_fn.template operator()<new_optional_awaitable<T>, T>(new_driver); + } catch (const my_err&) { + new_threw = true; + } + + // Print toggle values for debugging + std::string toggle_info = "Toggles: "; + for (int i = 0; i <= test_toggles::largest; ++i) { + if (driver.toggles_[i]) { + toggle_info += std::to_string(i) + " "; + } + } + toggle_info += "\n"; + std::cerr << toggle_info.c_str() << std::endl; + + assert(old_threw == new_threw); + assert(old_res == new_res); + + // Compare events arrays directly using cur_event_ and indices + assert(old_driver.cur_event_ == new_driver.cur_event_); + for (int i = 0; i < old_driver.cur_event_; ++i) { + assert(old_driver.events_[i] == new_driver.events_[i]); + } + }); +} + +// Move-only, non-nullable type that quacks like int but stores a +// heap-allocated int. Used to exercise the machinery with a nontrivial type. +class heap_int { +private: + std::unique_ptr<int> ptr_; + +public: + explicit heap_int(int value) : ptr_(std::make_unique<int>(value)) {} + + heap_int operator+(const heap_int& other) const { return heap_int(*ptr_ + *other.ptr_); } + + bool operator==(const heap_int& other) const { return *ptr_ == *other.ptr_; } + + /*implicit*/ operator int() const { return *ptr_; } +}; + +void check_coro_with_driver(auto coro_fn) { + check_coro_with_driver_for<int>(coro_fn); + check_coro_with_driver_for<heap_int>(coro_fn); +} + +template <typename Awaitable, typename T> +std::optional<T> coro_shortcircuits_to_empty(test_driver& driver) { + T n = co_await Awaitable{driver, 1, std::optional<T>{11}}; + co_await Awaitable{driver, 2, std::optional<T>{}}; // return early! + co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}}; +} + +void test_coro_shortcircuits_to_empty() { + std::cerr << "test_coro_shortcircuits_to_empty" << std::endl; + check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { + return coro_shortcircuits_to_empty<Awaitable, T>(driver); + }); +} + +template <typename Awaitable, typename T> +std::optional<T> coro_simple_await(test_driver& driver) { + co_return co_await Awaitable{driver, 1, std::optional<T>{11}} + + co_await Awaitable{driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}}; +} + +void test_coro_simple_await() { + std::cerr << "test_coro_simple_await" << std::endl; + check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { + return coro_simple_await<Awaitable, T>(driver); + }); +} + +// The next pair of tests checks that adding a `try-catch` in the coroutine +// doesn't affect control flow when `await_suspend_destroy` awaiters are in use. + +template <typename Awaitable, typename T> +std::optional<T> coro_catching_shortcircuits_to_empty(test_driver& driver) { + try { + T n = co_await Awaitable{driver, 1, std::optional<T>{11}}; + co_await Awaitable{driver, 2, std::optional<T>{}}; // return early! + co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}}; + } catch (...) { + driver.log(test_event::coro_catch); + throw; + } +} + +void test_coro_catching_shortcircuits_to_empty() { + std::cerr << "test_coro_catching_shortcircuits_to_empty" << std::endl; + check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { + return coro_catching_shortcircuits_to_empty<Awaitable, T>(driver); + }); +} + +template <typename Awaitable, typename T> +std::optional<T> coro_catching_simple_await(test_driver& driver) { + try { + co_return co_await Awaitable{driver, 1, std::optional<T>{11}} + + co_await Awaitable{ + driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}}; + } catch (...) { + driver.log(test_event::coro_catch); + throw; + } +} + +void test_coro_catching_simple_await() { + std::cerr << "test_coro_catching_simple_await" << std::endl; + check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { + return coro_catching_simple_await<Awaitable, T>(driver); + }); +} + +// The next pair of tests shows that the `await_suspend_destroy` code path works +// correctly, even if it's mixed in a coroutine with legacy awaitables. + +template <typename Awaitable, typename T> +std::optional<T> noneliding_coro_shortcircuits_to_empty(test_driver& driver) { + T n = co_await Awaitable{driver, 1, std::optional<T>{11}}; + T n2 = co_await old_optional_awaitable<T>{driver, 2, std::optional<T>{22}}; + co_await Awaitable{driver, 3, std::optional<T>{}}; // return early! + co_return n + n2 + co_await Awaitable{driver, 4, std::optional<T>{44}}; +} + +void test_noneliding_coro_shortcircuits_to_empty() { + std::cerr << "test_noneliding_coro_shortcircuits_to_empty" << std::endl; + check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { + return noneliding_coro_shortcircuits_to_empty<Awaitable, T>(driver); + }); +} + +template <typename Awaitable, typename T> +std::optional<T> noneliding_coro_simple_await(test_driver& driver) { + co_return co_await Awaitable{driver, 1, std::optional<T>{11}} + + co_await Awaitable{driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}} + + co_await old_optional_awaitable<T>{driver, 3, std::optional<T>{33}}; +} + +void test_noneliding_coro_simple_await() { + std::cerr << "test_noneliding_coro_simple_await" << std::endl; + check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { + return noneliding_coro_simple_await<Awaitable, T>(driver); + }); +} + +// Test nested coroutines (coroutines that await other coroutines) + +template <typename Awaitable, typename T> +std::optional<T> inner_coro(test_driver& driver, int base_id) { + co_return co_await Awaitable{driver, base_id, std::optional<T>{100}} + + co_await Awaitable{ + driver, base_id + 1, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{200}}; +} + +template <typename Awaitable, typename T> +std::optional<T> outer_coro(test_driver& driver) { + T result1 = co_await Awaitable{driver, 1, inner_coro<Awaitable, T>(driver, 10)}; + T result2 = co_await Awaitable{driver, 2, inner_coro<Awaitable, T>(driver, 20)}; + co_return result1 + result2; +} + +void test_nested_coroutines() { + std::cerr << "test_nested_coroutines" << std::endl; + check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { + return outer_coro<Awaitable, T>(driver); + }); +} + +int main(int, char**) { + test_coro_shortcircuits_to_empty(); + test_coro_simple_await(); + test_coro_catching_shortcircuits_to_empty(); + test_coro_catching_simple_await(); + test_noneliding_coro_shortcircuits_to_empty(); + test_noneliding_coro_simple_await(); + test_nested_coroutines(); + return 0; +} >From eb5557ab0eb43ff216441603d1c47615869d0bbe Mon Sep 17 00:00:00 2001 From: lesha <le...@meta.com> Date: Thu, 7 Aug 2025 23:38:21 -0700 Subject: [PATCH 2/3] Fix CI --- clang/include/clang/Basic/AttrDocs.td | 32 ++++++------- .../coro_await_suspend_destroy.pass.cpp | 48 +++++++++++++++++-- 2 files changed, 60 insertions(+), 20 deletions(-) diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td index d2224d86b3900..e45f692740193 100644 --- a/clang/include/clang/Basic/AttrDocs.td +++ b/clang/include/clang/Basic/AttrDocs.td @@ -9312,12 +9312,12 @@ flow as): } The benefits of this attribute are: - - **Avoid heap allocations for coro frames**: Allocating short-circuiting - coros on the stack makes code more predictable under memory pressure. - Without this attribute, LLVM cannot elide heap allocation even when all - awaiters are short-circuiting. - - **Performance**: Significantly faster execution and smaller code size. - - **Build time**: Faster compilation due to less IR being generated. +- **Avoid heap allocations for coro frames**: Allocating short-circuiting + coros on the stack makes code more predictable under memory pressure. + Without this attribute, LLVM cannot elide heap allocation even when all + awaiters are short-circuiting. +- **Performance**: Significantly faster execution and smaller code size. +- **Build time**: Faster compilation due to less IR being generated. Marking your ``await_suspend_destroy`` method as ``noexcept`` can sometimes further improve optimization. @@ -9343,16 +9343,16 @@ Here is a toy example of a portable short-circuiting awaiter: If all suspension points use (i) trivial or (ii) short-circuiting awaiters, then the coroutine optimizes more like a plain function, with 2 caveats: - - **Behavior:** The coroutine promise provides an implicit exception boundary - (as if wrapping the function in ``try {} catch { unhandled_exception(); }``). - This exception handling behavior is usually desirable in robust, - return-value-oriented programs that need short-circuiting coroutines. - Otherwise, the promise can always re-throw. - - **Speed:** As of 2025, there is still an optimization gap between a - realistic short-circuiting coro, and the equivalent (but much more verbose) - function. For a guesstimate, expect 4-5ns per call on x86. One idea for - improvement is to also elide trivial suspends like `std::suspend_never`, in - order to hit the `HasCoroSuspend` path in `CoroEarly.cpp`. +- **Behavior:** The coroutine promise provides an implicit exception boundary + (as if wrapping the function in ``try {} catch { unhandled_exception(); }``). + This exception handling behavior is usually desirable in robust, + return-value-oriented programs that need short-circuiting coroutines. + Otherwise, the promise can always re-throw. +- **Speed:** As of 2025, there is still an optimization gap between a + realistic short-circuiting coro, and the equivalent (but much more verbose) + function. For a guesstimate, expect 4-5ns per call on x86. One idea for + improvement is to also elide trivial suspends like `std::suspend_never`, in + order to hit the `HasCoroSuspend` path in `CoroEarly.cpp`. }]; } diff --git a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp index 1b48b1523bf12..9da8ba530edf3 100644 --- a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp +++ b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp @@ -40,6 +40,14 @@ #include <optional> #include <string> +#define DEBUG_LOG 0 // Logs break no-localization CI, set to 1 if needed + +#ifndef TEST_HAS_NO_EXCEPTIONS +# define THROW(_ex) throw _ex; +#else +# define THROW(_ex) +#endif + struct my_err : std::exception {}; enum test_toggles { @@ -110,7 +118,7 @@ struct optional_wrapper { operator std::optional<T>() { if (driver_.toggles(test_toggles::throw_in_convert_optional_wrapper)) { driver_.log(test_event::throw_convert_optional_wrapper); - throw my_err(); + THROW(my_err()); } driver_.log(test_event::convert_optional_wrapper); return std::move(storage_); @@ -143,7 +151,7 @@ struct std::coroutine_traits<std::optional<T>, test_driver&, Args...> { driver_.log(test_event::return_value, value); if (driver_.toggles(test_toggles::throw_in_return_value)) { driver_.log(test_event::throw_return_value); - throw my_err(); + THROW(my_err()); } *storagePtr_ = std::move(value); } @@ -169,7 +177,7 @@ struct base_optional_awaitable { T await_resume() { if (driver_.toggles(test_toggles::throw_in_await_resume)) { driver_.log(test_event::throw_await_resume, id_); - throw my_err(); + THROW(my_err()); } driver_.log(test_event::await_resume, id_); return std::move(opt_).value(); @@ -184,7 +192,7 @@ struct base_optional_awaitable { assert(promise.storagePtr_); if (driver_.toggles(test_toggles::throw_in_await_suspend_destroy)) { driver_.log(test_event::throw_await_suspend_destroy, id_); - throw my_err(); + THROW(my_err()); } driver_.log(test_event::await_suspend_destroy, id_); } @@ -218,20 +226,29 @@ void check_coro_with_driver_for(auto coro_fn) { auto old_driver = driver; std::optional<T> old_res; bool old_threw = false; +#ifndef TEST_HAS_NO_EXCEPTIONS try { +#endif old_res = coro_fn.template operator()<old_optional_awaitable<T>, T>(old_driver); +#ifndef TEST_HAS_NO_EXCEPTIONS } catch (const my_err&) { old_threw = true; } +#endif auto new_driver = driver; std::optional<T> new_res; bool new_threw = false; +#ifndef TEST_HAS_NO_EXCEPTIONS try { +#endif new_res = coro_fn.template operator()<new_optional_awaitable<T>, T>(new_driver); +#ifndef TEST_HAS_NO_EXCEPTIONS } catch (const my_err&) { new_threw = true; } +#endif +#if DEBUG_LOG // Print toggle values for debugging std::string toggle_info = "Toggles: "; for (int i = 0; i <= test_toggles::largest; ++i) { @@ -241,6 +258,7 @@ void check_coro_with_driver_for(auto coro_fn) { } toggle_info += "\n"; std::cerr << toggle_info.c_str() << std::endl; +#endif assert(old_threw == new_threw); assert(old_res == new_res); @@ -282,7 +300,9 @@ std::optional<T> coro_shortcircuits_to_empty(test_driver& driver) { } void test_coro_shortcircuits_to_empty() { +#if DEBUG_LOG std::cerr << "test_coro_shortcircuits_to_empty" << std::endl; +#endif check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { return coro_shortcircuits_to_empty<Awaitable, T>(driver); }); @@ -295,7 +315,9 @@ std::optional<T> coro_simple_await(test_driver& driver) { } void test_coro_simple_await() { +#if DEBUG_LOG std::cerr << "test_coro_simple_await" << std::endl; +#endif check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { return coro_simple_await<Awaitable, T>(driver); }); @@ -306,18 +328,24 @@ void test_coro_simple_await() { template <typename Awaitable, typename T> std::optional<T> coro_catching_shortcircuits_to_empty(test_driver& driver) { +#ifndef TEST_HAS_NO_EXCEPTIONS try { +#endif T n = co_await Awaitable{driver, 1, std::optional<T>{11}}; co_await Awaitable{driver, 2, std::optional<T>{}}; // return early! co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}}; +#ifndef TEST_HAS_NO_EXCEPTIONS } catch (...) { driver.log(test_event::coro_catch); throw; } +#endif } void test_coro_catching_shortcircuits_to_empty() { +#if DEBUG_LOG std::cerr << "test_coro_catching_shortcircuits_to_empty" << std::endl; +#endif check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { return coro_catching_shortcircuits_to_empty<Awaitable, T>(driver); }); @@ -325,18 +353,24 @@ void test_coro_catching_shortcircuits_to_empty() { template <typename Awaitable, typename T> std::optional<T> coro_catching_simple_await(test_driver& driver) { +#ifndef TEST_HAS_NO_EXCEPTIONS try { +#endif co_return co_await Awaitable{driver, 1, std::optional<T>{11}} + co_await Awaitable{ driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}}; +#ifndef TEST_HAS_NO_EXCEPTIONS } catch (...) { driver.log(test_event::coro_catch); throw; } +#endif } void test_coro_catching_simple_await() { +#if DEBUG_LOG std::cerr << "test_coro_catching_simple_await" << std::endl; +#endif check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { return coro_catching_simple_await<Awaitable, T>(driver); }); @@ -354,7 +388,9 @@ std::optional<T> noneliding_coro_shortcircuits_to_empty(test_driver& driver) { } void test_noneliding_coro_shortcircuits_to_empty() { +#if DEBUG_LOG std::cerr << "test_noneliding_coro_shortcircuits_to_empty" << std::endl; +#endif check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { return noneliding_coro_shortcircuits_to_empty<Awaitable, T>(driver); }); @@ -368,7 +404,9 @@ std::optional<T> noneliding_coro_simple_await(test_driver& driver) { } void test_noneliding_coro_simple_await() { +#if DEBUG_LOG std::cerr << "test_noneliding_coro_simple_await" << std::endl; +#endif check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { return noneliding_coro_simple_await<Awaitable, T>(driver); }); @@ -391,7 +429,9 @@ std::optional<T> outer_coro(test_driver& driver) { } void test_nested_coroutines() { +#if DEBUG_LOG std::cerr << "test_nested_coroutines" << std::endl; +#endif check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) { return outer_coro<Awaitable, T>(driver); }); >From 5d6a06d27ba913bc49286a8bdea97446ba37fae2 Mon Sep 17 00:00:00 2001 From: lesha <le...@meta.com> Date: Fri, 8 Aug 2025 00:12:00 -0700 Subject: [PATCH 3/3] Improve doc formatting --- clang/include/clang/Basic/AttrDocs.td | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td index e45f692740193..a80b8e97efee2 100644 --- a/clang/include/clang/Basic/AttrDocs.td +++ b/clang/include/clang/Basic/AttrDocs.td @@ -9312,11 +9312,14 @@ flow as): } The benefits of this attribute are: + - **Avoid heap allocations for coro frames**: Allocating short-circuiting coros on the stack makes code more predictable under memory pressure. Without this attribute, LLVM cannot elide heap allocation even when all awaiters are short-circuiting. + - **Performance**: Significantly faster execution and smaller code size. + - **Build time**: Faster compilation due to less IR being generated. Marking your ``await_suspend_destroy`` method as ``noexcept`` can sometimes @@ -9343,11 +9346,13 @@ Here is a toy example of a portable short-circuiting awaiter: If all suspension points use (i) trivial or (ii) short-circuiting awaiters, then the coroutine optimizes more like a plain function, with 2 caveats: + - **Behavior:** The coroutine promise provides an implicit exception boundary (as if wrapping the function in ``try {} catch { unhandled_exception(); }``). This exception handling behavior is usually desirable in robust, return-value-oriented programs that need short-circuiting coroutines. Otherwise, the promise can always re-throw. + - **Speed:** As of 2025, there is still an optimization gap between a realistic short-circuiting coro, and the equivalent (but much more verbose) function. For a guesstimate, expect 4-5ns per call on x86. One idea for _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits