gemini-code-assist[bot] commented on code in PR #35752:
URL: https://github.com/apache/beam/pull/35752#discussion_r2245812994
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/CachingFactory.java:
##########
@@ -45,8 +45,14 @@ public CachingFactory(@UnknownInitialization
Factory<CreatedT> innerFactory) {
}
private ConcurrentHashMap<TypeDescriptor<?>, CreatedT> getCache() {
+ ConcurrentHashMap<TypeDescriptor<?>, CreatedT> value = cache;
if (cache == null) {
- cache = new ConcurrentHashMap<>();
+ synchronized (this) {
+ value = cache;
+ if (value == null) {
+ cache = value = new ConcurrentHashMap<>();
+ }
+ }
}
return cache;
Review Comment:

The current implementation of double-checked locking is not robust and
deviates from the standard pattern.
Specifically:
* Reading the volatile `cache` field multiple times instead of using a
local variable is less performant.
* There's a potential race condition. If `cache` is non-null at line 49,
the `if` block is skipped. If another thread sets `cache` to `null` between
line 49 and 57, this method could return `null`.
To ensure correctness and adhere to the standard DCL idiom, you should read
the volatile field into a local variable once and then work with that local
variable.
```java
ConcurrentHashMap<TypeDescriptor<?>, CreatedT> result = cache;
if (result == null) {
synchronized (this) {
result = cache;
if (result == null) {
result = new ConcurrentHashMap<>();
cache = result;
}
}
}
return result;
```
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/FromRowUsingCreator.java:
##########
@@ -102,11 +102,17 @@ public T apply(Row row) {
return (T) creator.create(params);
}
- private synchronized void initFieldConverters(Schema schema) {
+ private synchronized void initFieldConverters(Schema schema) {
+ Function[] value = fieldConverters;
if (fieldConverters == null) {
- CachingFactory<List<FieldValueTypeInformation>> typeFactory =
- new CachingFactory<>(schemaProvider::fieldValueTypeInformations);
- fieldConverters = fieldConverters(typeDescriptor, schema, typeFactory);
+ synchronized (this) {
+ value = fieldConverters;
+ if (fieldConverters == null) {
+ CachingFactory<List<FieldValueTypeInformation>> typeFactory =
+ new CachingFactory<>(schemaProvider::fieldValueTypeInformations);
+ fieldConverters = value = fieldConverters(typeDescriptor, schema,
typeFactory);
+ }
+ }
}
}
Review Comment:

The `initFieldConverters` method is declared as `synchronized`, which makes
the inner `synchronized (this)` block redundant. This defeats the purpose of
using double-checked locking, which is to avoid the synchronization overhead
once the field is initialized.
If the goal is to optimize this with DCL, the `synchronized` keyword should
be removed from the method signature.
Additionally, the DCL implementation itself is not idiomatic as it
repeatedly checks the volatile `fieldConverters` field instead of using a local
variable.
Here is a corrected version that removes the method-level synchronization
and uses the standard DCL pattern:
```suggestion
private void initFieldConverters(Schema schema) {
Function[] result = fieldConverters;
if (result == null) {
synchronized (this) {
result = fieldConverters;
if (result == null) {
CachingFactory<List<FieldValueTypeInformation>> typeFactory =
new
CachingFactory<>(schemaProvider::fieldValueTypeInformations);
result = fieldConverters(typeDescriptor, schema, typeFactory);
fieldConverters = result;
}
}
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@beam.apache.org
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org