scwhittle commented on code in PR #34873:
URL: https://github.com/apache/beam/pull/34873#discussion_r2081292585


##########
sdks/java/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/coders/AvroCoder.java:
##########
@@ -840,4 +843,38 @@ public boolean equals(@Nullable Object other) {
   public int hashCode() {
     return Objects.hash(getClass(), typeDescriptor, datumFactory, 
schemaSupplier.get());
   }
+
+  private void readObject(ObjectInputStream in) throws IOException, 
ClassNotFoundException {

Review Comment:
   This is great, thanks!



##########
sdks/java/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/coders/AvroGenericCoder.java:
##########
@@ -17,17 +17,27 @@
  */
 package org.apache.beam.sdk.extensions.avro.coders;
 
+import java.util.concurrent.ExecutionException;
 import org.apache.avro.Schema;
 import org.apache.avro.generic.GenericRecord;
 import org.apache.beam.sdk.extensions.avro.io.AvroDatumFactory;
+import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.cache.Cache;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.cache.CacheBuilder;
 
 /** AvroCoder specialisation for GenericRecord, needed for cross-language 
transforms. */
 public class AvroGenericCoder extends AvroCoder<GenericRecord> {
+  private static final Cache<Schema, AvroGenericCoder> 
AVRO_GENERIC_CODER_CACHE =
+      CacheBuilder.newBuilder().build();

Review Comment:
   actually this cache could still be weakValues



##########
sdks/java/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/coders/AvroCoder.java:
##########
@@ -354,8 +392,10 @@ public Schema get() {
   // an inner coder.
   private final EmptyOnDeserializationThreadLocal<BinaryDecoder> decoder;
   private final EmptyOnDeserializationThreadLocal<BinaryEncoder> encoder;
-  private final EmptyOnDeserializationThreadLocal<DatumWriter<T>> writer;

Review Comment:
   Sorry I didn't think of this earlier, but I realized that this is going to 
change the java serialization of this coder which will lead to update 
incompatability.
   
   Options:
   - keep the existing fields and the new transient fields and just don't use 
existing fields.
   - remove the new fields and change existing fields to do the cache lookup in 
the initialvalue method.  I think you could remove readObject then. We are 
using unnecessary per-thread caching still then.
   
   I think I'd lean towards the first option of keeping the existing fields but 
not using them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to