wzx140 commented on code in PR #7021:
URL: https://github.com/apache/hudi/pull/7021#discussion_r1003952529
##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecord.java:
##########
@@ -189,14 +193,37 @@ public Option<Map<String, String>> getMetadata() {
@Override
public Option<HoodieAvroIndexedRecord> toIndexedRecord(Schema recordSchema,
Properties props) throws IOException {
- Option<IndexedRecord> avroData = getData().getInsertValue(recordSchema,
props);
+ Option<IndexedRecord> avroData = getCachedDeserializedRecord(recordSchema,
props);
if (avroData.isPresent()) {
return Option.of(new HoodieAvroIndexedRecord(avroData.get()));
} else {
return Option.empty();
}
}
+ private Option<IndexedRecord> getCachedDeserializedRecord(Schema
recordSchema, Properties props) throws IOException {
+ // Check schema identical
+ if (this.cachedDeserializedRecord != null &&
this.cachedDeserializedRecord.isPresent()
+ && !compareSchema(cachedDeserializedRecord.get().getSchema(),
recordSchema)) {
+ this.cachedDeserializedRecord = null;
+ }
+ if (this.cachedDeserializedRecord == null) {
+ this.cachedDeserializedRecord = this.data.getInsertValue(recordSchema,
props);
+ }
+ return this.cachedDeserializedRecord;
+ }
+
+ private static Boolean compareSchema(Schema left, Schema right) {
+ if (left == null || right == null) {
+ return false;
+ }
+ Pair<Schema, Schema> schemaPair = Pair.of(left, right);
+ if (!SCHEMA_COMPARE_MAP.containsKey(schemaPair)) {
Review Comment:
1. SCHEMA_COMPARE_MAP is hashmap and its get/containsKey function is O(1).
We just compare Pair once.
2. As you said before, schema compare will kill perf-gain. We need to cache
the compare result.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]