scwhittle commented on code in PR #34873:
URL: https://github.com/apache/beam/pull/34873#discussion_r2079855377
##########
sdks/java/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/coders/AvroCoder.java:
##########
@@ -840,4 +843,38 @@ public boolean equals(@Nullable Object other) {
public int hashCode() {
return Objects.hash(getClass(), typeDescriptor, datumFactory,
schemaSupplier.get());
}
+
+ private void readObject(ObjectInputStream in) throws IOException,
ClassNotFoundException {
Review Comment:
I know that for DoFn objects we deserialize them multiple times and cache
them since they are single-threaded and we run in multiple threads in parallel.
However I'm not sure if Coders are deserialized similarly. Since they are
thread-safe it seems they could be deserialized once and shared.
@kennknowles Do you know how coders are represented and deserialized off the
top of your head? Should we worry about multiple copies not sharing state or
is that unexpected? Are coders represented with some set of coders for the
pipeline or could the same coder perhaps be serialized multiple times in the
graph?
One benefit of the writeReplace and readResolve idea is that the same
AvroCoder instance would be used from multiple deserializations so the
thread-local encode/decode buffers it contains would be shared, not just the
reader and writer. But we could do that as a follow up.
Were you able to test this to verify it was working as expected?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]