Eliaaazzz opened a new issue, #37351:
URL: https://github.com/apache/beam/issues/37351

   ### What happened?
   
   What happened?
   I discovered a bug in ByteBuddyDoFnInvokerFactory where the cache key 
strategy causes collisions for the same DoFn class used with different generic 
types.
   
   Currently, the cache is keyed solely on the DoFn class: 
[https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java#L305](https://www.google.com/search?q=https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java%23L305)
   
   This causes incorrect invoker reuse when generics are erased. For example, 
MyDoFn<String> and MyDoFn<Integer> share the same cache key (MyDoFn.class). As 
a result, the invoker generated for String (which may contain specific casts) 
is returned for Integer, potentially leading to ClassCastException or incorrect 
behavior.
   
   Issue Priority
   Priority: 2 (default)
   
   Issue Component
   Component: sdk-java-core
   
   Steps to reproduce
   I reproduced this behavior with the following unit test in 
DoFnInvokersTest.java. The test passes (confirming the collision) when it 
should ideally produce different invokers.
   
   Java
     @Test
     public void testCacheKeyCollisionProof() throws Exception {
       class IdentityDoFn<T> extends DoFn<T, T> {
         @ProcessElement
         public void processElement(@Element T element, OutputReceiver<T> out) {
           out.output(element);
         }
       }
   
       // 1. Generate Invoker for String
       IdentityDoFn<String> stringDoFn = new IdentityDoFn<>();
       DoFnInvoker<String, String> stringInvoker = 
DoFnInvokers.invokerFor(stringDoFn);
       
       // 2. Generate Invoker for Integer
       IdentityDoFn<Integer> intDoFn = new IdentityDoFn<>();
       DoFnInvoker<Integer, Integer> intInvoker = 
DoFnInvokers.invokerFor(intDoFn);
   
       // This assertion passes, proving that the cache returns the SAME class 
       // for different generic types.
       assertSame("Bug confirmed: Different generic types share the same cached 
Invoker class!", 
                  stringInvoker.getClass(), intInvoker.getClass());
     }
   Expected results
   DoFnInvokers.invokerFor should return different invoker classes for 
IdentityDoFn<String> and IdentityDoFn<Integer>, as they may require different 
type handling logic (e.g. casts) in the generated bytecode.
   
   Actual results
   The same invoker class is returned from the cache.
   
   <img width="1738" height="1332" alt="Image" 
src="https://github.com/user-attachments/assets/3cdfc8b7-7cc3-4fd5-8e42-90d56e3068a7";
 />
   
   
   Volunteer
   I would like to volunteer to fix this.
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [x] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to