Thanks Kenn. I have opened a GitHub issue to track the DoFnInvoker cache bug and submitted a PR with the fix and regression tests.
Issue: https://github.com/apache/beam/issues/37351 ( https://github.com/apache/beam/issues/37351) PR: https://github.com/apache/beam/pull/37355 ( https://github.com/apache/beam/pull/37355) Could you please review the PR when you have a moment? Best, Elia LIU On Thu, 22 Jan 2026 at 08:44, Kenneth Knowles <[email protected]> wrote: > Excellent, thanks! > > Kenn > > On Mon, Jan 19, 2026 at 8:19 AM Elia LIU <[email protected]> wrote: > >> Hi all, >> >> Thanks for the discussion and pointers. I checked GitHub and didn't >> see an assigned issue/PR for this yet. >> >> I agree with the consensus here (Kenn/Reuven/Robert/Byron) that this >> looks like a bug: we’re memoizing DoFnInvoker bytecode generation, but >> the cache key is currently only the DoFn class. This appears to be >> missing pertinent inputs and can lead to reusing an invoker with the >> wrong cast target. >> >> I’d like to volunteer to fix this. >> >> Plan: >> >> 1. Add a regression test that reproduces the >> collision/ClassCastException (e.g., reusing the same DoFn class in >> different contexts with different cast targets). >> >> 2. Update ByteBuddyDoFnInvokerFactory to key the cache on the DoFn >> class plus the cast target (as Byron suggested). >> >> 3. If the cast target isn’t directly available at the caching >> boundary, I can explore using the stage name as a proxy as Robert >> suggested. >> >> 4. Submit a PR for review. >> >> I’ll open a GitHub issue to track this and link it back to this thread. >> >> Best, >> Elia LIU >> >
