scwhittle commented on code in PR #34873:
URL: https://github.com/apache/beam/pull/34873#discussion_r2079855377


##########
sdks/java/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/coders/AvroCoder.java:
##########
@@ -840,4 +843,38 @@ public boolean equals(@Nullable Object other) {
   public int hashCode() {
     return Objects.hash(getClass(), typeDescriptor, datumFactory, 
schemaSupplier.get());
   }
+
+  private void readObject(ObjectInputStream in) throws IOException, 
ClassNotFoundException {

Review Comment:
   I know that for DoFn objects we deserialize them multiple times and cache 
them since they are single-threaded and we run in multiple threads in parallel. 
 However I'm not sure if Coders are deserialized similarly.  Since they are 
thread-safe it seems they could be deserialized once and shared.
   @kennknowles Do you know how coders are represented and deserialized off the 
top of your head? Should we worry about multiple copies  not sharing state or 
is that unexpected? Are coders represented with some set of coders for the 
pipeline or could the same coder perhaps be serialized multiple times in the 
graph?
   
   One benefit of the writeReplace and readResolve idea is that the same 
AvroCoder instance would be used from multiple deserializations so the 
thread-local encode/decode buffers it contains would be shared, not just the 
reader and writer. But we could do that as a follow up.
   
   Were you able to test this to verify it was working as expected?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to