voonhous commented on code in PR #14340:
URL: https://github.com/apache/hudi/pull/14340#discussion_r2599292947
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##########
@@ -410,7 +410,10 @@ public Option<IndexedRecord> getInsertValue(Schema schema,
Properties properties
return Option.empty();
}
- if (schema == null || HOODIE_METADATA_SCHEMA == schema) {
+ // TODO: feature(schema): HoodieSchema change, we removed caching in a few
areas, during the migration of Avro.Schema -> HoodieSchema.
Review Comment:
Sorry, what I meant in the TODO is that caching was implicitly removed when
introducing `HoodieSchemaCache`.
**Before the migration:**
- When schemas were read from log file headers, they went through
AvroSchemaCache.intern(Schema)
- AvroSchemaCache is keyed by Schema (using Schema.equals()), so even if a
new Schema object was parsed, intern() would return the same cached reference
as HOODIE_METADATA_SCHEMA
**After the migration:**
- Schemas are parsed into HoodieSchema via HoodieSchema.parse()
- HoodieSchemaCache.intern(HoodieSchema) caches the HoodieSchema wrapper
- When `.toAvroSchema()` is called, it returns the underlying Avro Schema
which was created by avroParser.parse()
- this is a **NEW** object, not the same reference as
HOODIE_METADATA_SCHEMA
During this transitionary PRs, we have the following mismatch:
- AvroSchemaCache ensured Avro Schema reference reuse
- HoodieSchemaCache ensures HoodieSchema wrapper reuse, but the wrapped
Avro Schema is still a different object
The `.equals()` is a temporary bridge during the migration.
To clarify this, that once HoodieRecordPayload.getInsertValue() accepts
HoodieSchema instead of Schema, the == comparison can be restored, do you want
me to add that into the comment?
Addressing this together with the PR will increase the SLOC further.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]