scwhittle commented on issue #34749:
URL: https://github.com/apache/beam/issues/34749#issuecomment-2854377667

   I was thinking that most coder usage was setup during graph construction and 
thus new coders weren't being allocated, but it does appear there are 
AvroCoder.of sprinkled through various sites.  So your original hypothesis that 
these reader/writers were being created regularly seems true, since the 
thread-local cache is per-AvroCoder instance and we are creating new ones all 
the time. Your example checkpointcoder one could be changed to use a singleton 
like some other UnboundedSources but fixing generically seems worthwhile to 
avoid this pain point.
   
   The proposed approach will fix that we have duplicate instances of the 
reader/writer but we will still end up having multiple AvroCoder instances 
which are logically the same and thus can have multiple copies per-thread of 
these encoder/decoder objects (and we won't resuse existing ones for a freshly 
created coder).
   
https://source.corp.google.com/piper///depot/google3/third_party/java_src/apache_beam4g/srcs/sdks/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/coders/AvroCoder.java;l=376;bpv=1;bpt=1;rcl=755306863
   
   So it seems it could be better to move the static cache to within AvroCoder 
itself so that all of the state for the same class/datumfactory is reused.
   
   To clarify, I was suggesting removing the threadlocal for reader/writer in 
AvroCoder if they are thread-safe (which it seems they must be if we are 
changing to reuse them across threads with the cache). There is no reason to 
have a per-thread writer that is actually the same object for all the threads 
due to the cache.  So I think it could just be transient DatumReader that we 
either restore in deserialization (example 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/WindowedValue.java#L892)
 or initialize lazily (example 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/SerializableCoder.java#L178)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to