scwhittle commented on code in PR #34873: URL: https://github.com/apache/beam/pull/34873#discussion_r2081292585
########## sdks/java/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/coders/AvroCoder.java: ########## @@ -840,4 +843,38 @@ public boolean equals(@Nullable Object other) { public int hashCode() { return Objects.hash(getClass(), typeDescriptor, datumFactory, schemaSupplier.get()); } + + private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException { Review Comment: This is great, thanks! ########## sdks/java/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/coders/AvroGenericCoder.java: ########## @@ -17,17 +17,27 @@ */ package org.apache.beam.sdk.extensions.avro.coders; +import java.util.concurrent.ExecutionException; import org.apache.avro.Schema; import org.apache.avro.generic.GenericRecord; import org.apache.beam.sdk.extensions.avro.io.AvroDatumFactory; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.cache.Cache; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.cache.CacheBuilder; /** AvroCoder specialisation for GenericRecord, needed for cross-language transforms. */ public class AvroGenericCoder extends AvroCoder<GenericRecord> { + private static final Cache<Schema, AvroGenericCoder> AVRO_GENERIC_CODER_CACHE = + CacheBuilder.newBuilder().build(); Review Comment: actually this cache could still be weakValues ########## sdks/java/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/coders/AvroCoder.java: ########## @@ -354,8 +392,10 @@ public Schema get() { // an inner coder. private final EmptyOnDeserializationThreadLocal<BinaryDecoder> decoder; private final EmptyOnDeserializationThreadLocal<BinaryEncoder> encoder; - private final EmptyOnDeserializationThreadLocal<DatumWriter<T>> writer; Review Comment: Sorry I didn't think of this earlier, but I realized that this is going to change the java serialization of this coder which will lead to update incompatability. Options: - keep the existing fields and the new transient fields and just don't use existing fields. - remove the new fields and change existing fields to do the cache lookup in the initialvalue method. I think you could remove readObject then. We are using unnecessary per-thread caching still then. I think I'd lean towards the first option of keeping the existing fields but not using them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org