Eliaaazzz opened a new issue, #37351: URL: https://github.com/apache/beam/issues/37351
### What happened? What happened? I discovered a bug in ByteBuddyDoFnInvokerFactory where the cache key strategy causes collisions for the same DoFn class used with different generic types. Currently, the cache is keyed solely on the DoFn class: [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java#L305](https://www.google.com/search?q=https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java%23L305) This causes incorrect invoker reuse when generics are erased. For example, MyDoFn<String> and MyDoFn<Integer> share the same cache key (MyDoFn.class). As a result, the invoker generated for String (which may contain specific casts) is returned for Integer, potentially leading to ClassCastException or incorrect behavior. Issue Priority Priority: 2 (default) Issue Component Component: sdk-java-core Steps to reproduce I reproduced this behavior with the following unit test in DoFnInvokersTest.java. The test passes (confirming the collision) when it should ideally produce different invokers. Java @Test public void testCacheKeyCollisionProof() throws Exception { class IdentityDoFn<T> extends DoFn<T, T> { @ProcessElement public void processElement(@Element T element, OutputReceiver<T> out) { out.output(element); } } // 1. Generate Invoker for String IdentityDoFn<String> stringDoFn = new IdentityDoFn<>(); DoFnInvoker<String, String> stringInvoker = DoFnInvokers.invokerFor(stringDoFn); // 2. Generate Invoker for Integer IdentityDoFn<Integer> intDoFn = new IdentityDoFn<>(); DoFnInvoker<Integer, Integer> intInvoker = DoFnInvokers.invokerFor(intDoFn); // This assertion passes, proving that the cache returns the SAME class // for different generic types. assertSame("Bug confirmed: Different generic types share the same cached Invoker class!", stringInvoker.getClass(), intInvoker.getClass()); } Expected results DoFnInvokers.invokerFor should return different invoker classes for IdentityDoFn<String> and IdentityDoFn<Integer>, as they may require different type handling logic (e.g. casts) in the generated bytecode. Actual results The same invoker class is returned from the cache. <img width="1738" height="1332" alt="Image" src="https://github.com/user-attachments/assets/3cdfc8b7-7cc3-4fd5-8e42-90d56e3068a7" /> Volunteer I would like to volunteer to fix this. ### Issue Priority Priority: 2 (default / most bugs should be filed as P2) ### Issue Components - [ ] Component: Python SDK - [x] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [ ] Component: IO connector - [ ] Component: Beam YAML - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Infrastructure - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
