voonhous commented on code in PR #18967:
URL: https://github.com/apache/hudi/pull/18967#discussion_r3389732979
##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -1152,10 +1152,15 @@ public List<HoodieSchemaField> getFields() {
if (!hasFields()) {
throw new IllegalStateException("Cannot get fields from schema type: " +
type);
}
- if (fields == null) {
- fields =
Collections.unmodifiableList(avroSchema.getFields().stream().map(HoodieSchemaField::new).collect(Collectors.toList()));
+ // interned instances are shared across threads, so publish through an
immutable wrapper
Review Comment:
Strictly speaking a lock is not necessary for correctness here: the caches
are published as immutable wrappers
(`Collections.unmodifiableList`/`unmodifiableMap`), whose backing field is
final, so under the JMM's final-field semantics a racing reader either sees
null and builds its own copy, or sees a fully constructed collection. The worst
case is a duplicate build that gets thrown away, never a torn read.
That said, I agree it is worth guarding: the correctness of the lock-free
version silently depends on the wrapper staying immutable, which nothing
enforces against future edits, and a lock also removes the duplicate-build
window entirely.
I went with double-checked locking rather than synchronizing the getters:
the cache fields are `volatile` and the monitor is taken only on the
initialization path, so initialization is fully lock-guarded while the
steady-state read path stays lock-free (a single volatile read, an ordinary
load on x86 and an acquire-load on ARM). Same pattern as `Lazy#get`. A
`synchronized` accessor would instead acquire a monitor per record per accessed
field on a JVM-shared instance, serializing all task threads on the hottest
read path, which is the cost this PR is removing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]