scwhittle commented on code in PR #34873: URL: https://github.com/apache/beam/pull/34873#discussion_r2079855377
########## sdks/java/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/coders/AvroCoder.java: ########## @@ -840,4 +843,38 @@ public boolean equals(@Nullable Object other) { public int hashCode() { return Objects.hash(getClass(), typeDescriptor, datumFactory, schemaSupplier.get()); } + + private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException { Review Comment: I know that for DoFn objects we deserialize them multiple times and cache them since they are single-threaded and we run in multiple threads in parallel. However I'm not sure if Coders are deserialized similarly. Since they are thread-safe it seems they could be deserialized once and shared. @kennknowles Do you know how coders are represented and deserialized off the top of your head? Should we worry about multiple copies not sharing state or is that unexpected? Are coders represented with some set of coders for the pipeline or could the same coder perhaps be serialized multiple times in the graph? One benefit of the writeReplace and readResolve idea is that the same AvroCoder instance would be used from multiple deserializations so the thread-local encode/decode buffers it contains would be shared, not just the reader and writer. But we could do that as a follow up. Were you able to test this to verify it was working as expected? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org