scwhittle commented on issue #34749: URL: https://github.com/apache/beam/issues/34749#issuecomment-2854377667
I was thinking that most coder usage was setup during graph construction and thus new coders weren't being allocated, but it does appear there are AvroCoder.of sprinkled through various sites. So your original hypothesis that these reader/writers were being created regularly seems true, since the thread-local cache is per-AvroCoder instance and we are creating new ones all the time. Your example checkpointcoder one could be changed to use a singleton like some other UnboundedSources but fixing generically seems worthwhile to avoid this pain point. The proposed approach will fix that we have duplicate instances of the reader/writer but we will still end up having multiple AvroCoder instances which are logically the same and thus can have multiple copies per-thread of these encoder/decoder objects (and we won't resuse existing ones for a freshly created coder). https://source.corp.google.com/piper///depot/google3/third_party/java_src/apache_beam4g/srcs/sdks/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/coders/AvroCoder.java;l=376;bpv=1;bpt=1;rcl=755306863 So it seems it could be better to move the static cache to within AvroCoder itself so that all of the state for the same class/datumfactory is reused. To clarify, I was suggesting removing the threadlocal for reader/writer in AvroCoder if they are thread-safe (which it seems they must be if we are changing to reuse them across threads with the cache). There is no reason to have a per-thread writer that is actually the same object for all the threads due to the cache. So I think it could just be transient DatumReader that we either restore in deserialization (example https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/WindowedValue.java#L892) or initialize lazily (example https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/SerializableCoder.java#L178) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org