Thanks Kenn.  I have opened a GitHub issue to track the DoFnInvoker cache
bug and submitted a PR with the fix and regression tests.

Issue: https://github.com/apache/beam/issues/37351 (
https://github.com/apache/beam/issues/37351)
PR: https://github.com/apache/beam/pull/37355 (
https://github.com/apache/beam/pull/37355)

Could you please review the PR when you have a moment?

Best,
Elia LIU

On Thu, 22 Jan 2026 at 08:44, Kenneth Knowles <[email protected]> wrote:

> Excellent, thanks!
>
> Kenn
>
> On Mon, Jan 19, 2026 at 8:19 AM Elia LIU <[email protected]> wrote:
>
>> Hi all,
>>
>> Thanks for the discussion and pointers. I checked GitHub and didn't
>> see an assigned issue/PR for this yet.
>>
>> I agree with the consensus here (Kenn/Reuven/Robert/Byron) that this
>> looks like a bug: we’re memoizing DoFnInvoker bytecode generation, but
>> the cache key is currently only the DoFn class. This appears to be
>> missing pertinent inputs and can lead to reusing an invoker with the
>> wrong cast target.
>>
>> I’d like to volunteer to fix this.
>>
>> Plan:
>>
>> 1. Add a regression test that reproduces the
>> collision/ClassCastException (e.g., reusing the same DoFn class in
>> different contexts with different cast targets).
>>
>> 2. Update ByteBuddyDoFnInvokerFactory to key the cache on the DoFn
>> class plus the cast target (as Byron suggested).
>>
>> 3. If the cast target isn’t directly available at the caching
>> boundary, I can explore using the stage name as a proxy as Robert
>> suggested.
>>
>> 4. Submit a PR for review.
>>
>> I’ll open a GitHub issue to track this and link it back to this thread.
>>
>> Best,
>> Elia LIU
>>
>

Reply via email to