Eliaaazzz commented on PR #37352:
URL: https://github.com/apache/beam/pull/37352#issuecomment-3771596092

   @GlobalStar117 @GlobalStar117 Thanks for the detailed analysis!
   
   You are absolutely correct about Java type erasure: new MyDoFn<String>() and 
new MyDoFn<Integer>() indeed share the same runtime Class object.
   
   However, in Apache Beam, a DoFn's behavior isn't defined solely by its 
Class. We rely heavily on TypeDescriptor to handle serialization (Coders) and 
schema verification.
   
   Why this fix is necessary: Even if the raw class is the same, users can 
override getInputTypeDescriptor() (or use mechanisms that capture types) to 
provide different type information for the same raw DoFn class.
   
   The Evidence: My regression test (testCacheKeyCollisionProof) explicitly 
creates two instances of the same DoFn class but forces them to return 
different TypeDescriptors. Without this fix, the factory returns the same 
cached Invoker for both.
   
   The Consequence: If the first Invoker is generated/cached with logic 
specific to String, and then reused for an Integer context (because the cache 
key ignored the TypeDescriptor), it leads to runtime issues (like incorrect 
validation or potential ClassCastException in downstream logic that relies on 
the Invoker's signature).
   
   So, while type erasure applies to the user's class, the generated Invoker 
needs to be aware of the specific TypeDescriptor context to function correctly 
within the Beam pipeline. This PR ensures the cache respects that distinction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to