gemini-code-assist[bot] commented on code in PR #35752:
URL: https://github.com/apache/beam/pull/35752#discussion_r2245812994


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/CachingFactory.java:
##########
@@ -45,8 +45,14 @@ public CachingFactory(@UnknownInitialization 
Factory<CreatedT> innerFactory) {
   }
 
   private ConcurrentHashMap<TypeDescriptor<?>, CreatedT> getCache() {
+    ConcurrentHashMap<TypeDescriptor<?>, CreatedT> value = cache;
     if (cache == null) {
-      cache = new ConcurrentHashMap<>();
+      synchronized (this) {
+        value = cache;
+        if (value == null) {
+          cache = value = new ConcurrentHashMap<>();
+        }
+      }
     }
     return cache;

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The current implementation of double-checked locking is not robust and 
deviates from the standard pattern.
   
   Specifically:
   *   Reading the volatile `cache` field multiple times instead of using a 
local variable is less performant.
   *   There's a potential race condition. If `cache` is non-null at line 49, 
the `if` block is skipped. If another thread sets `cache` to `null` between 
line 49 and 57, this method could return `null`.
   
   To ensure correctness and adhere to the standard DCL idiom, you should read 
the volatile field into a local variable once and then work with that local 
variable.
   
   ```java
       ConcurrentHashMap<TypeDescriptor<?>, CreatedT> result = cache;
       if (result == null) {
         synchronized (this) {
           result = cache;
           if (result == null) {
             result = new ConcurrentHashMap<>();
             cache = result;
           }
         }
       }
       return result;
   ```



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/FromRowUsingCreator.java:
##########
@@ -102,11 +102,17 @@ public T apply(Row row) {
     return (T) creator.create(params);
   }
 
-  private synchronized void initFieldConverters(Schema schema) {
+   private synchronized void initFieldConverters(Schema schema) {
+    Function[] value = fieldConverters;
     if (fieldConverters == null) {
-      CachingFactory<List<FieldValueTypeInformation>> typeFactory =
-          new CachingFactory<>(schemaProvider::fieldValueTypeInformations);
-      fieldConverters = fieldConverters(typeDescriptor, schema, typeFactory);
+      synchronized (this) {
+        value = fieldConverters;
+        if (fieldConverters == null) {
+          CachingFactory<List<FieldValueTypeInformation>> typeFactory =
+              new CachingFactory<>(schemaProvider::fieldValueTypeInformations);
+          fieldConverters = value = fieldConverters(typeDescriptor, schema, 
typeFactory);
+        }
+      }
     }
   }

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The `initFieldConverters` method is declared as `synchronized`, which makes 
the inner `synchronized (this)` block redundant. This defeats the purpose of 
using double-checked locking, which is to avoid the synchronization overhead 
once the field is initialized.
   
   If the goal is to optimize this with DCL, the `synchronized` keyword should 
be removed from the method signature.
   
   Additionally, the DCL implementation itself is not idiomatic as it 
repeatedly checks the volatile `fieldConverters` field instead of using a local 
variable.
   
   Here is a corrected version that removes the method-level synchronization 
and uses the standard DCL pattern:
   
   ```suggestion
      private void initFieldConverters(Schema schema) {
       Function[] result = fieldConverters;
       if (result == null) {
         synchronized (this) {
           result = fieldConverters;
           if (result == null) {
             CachingFactory<List<FieldValueTypeInformation>> typeFactory =
                 new 
CachingFactory<>(schemaProvider::fieldValueTypeInformations);
             result = fieldConverters(typeDescriptor, schema, typeFactory);
             fieldConverters = result;
           }
         }
       }
      }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to