alexeykudinkin commented on code in PR #7021:
URL: https://github.com/apache/hudi/pull/7021#discussion_r1004805080


##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecord.java:
##########
@@ -189,14 +193,37 @@ public Option<Map<String, String>> getMetadata() {
 
   @Override
   public Option<HoodieAvroIndexedRecord> toIndexedRecord(Schema recordSchema, 
Properties props) throws IOException {
-    Option<IndexedRecord> avroData = getData().getInsertValue(recordSchema, 
props);
+    Option<IndexedRecord> avroData = getCachedDeserializedRecord(recordSchema, 
props);
     if (avroData.isPresent()) {
       return Option.of(new HoodieAvroIndexedRecord(avroData.get()));
     } else {
       return Option.empty();
     }
   }
 
+  private Option<IndexedRecord> getCachedDeserializedRecord(Schema 
recordSchema, Properties props) throws IOException {
+    // Check schema identical
+    if (this.cachedDeserializedRecord != null && 
this.cachedDeserializedRecord.isPresent()
+        && !compareSchema(cachedDeserializedRecord.get().getSchema(), 
recordSchema)) {
+      this.cachedDeserializedRecord = null;
+    }
+    if (this.cachedDeserializedRecord == null) {
+      this.cachedDeserializedRecord = this.data.getInsertValue(recordSchema, 
props);
+    }
+    return this.cachedDeserializedRecord;
+  }
+
+  private static Boolean compareSchema(Schema left, Schema right) {
+    if (left == null || right == null) {
+      return false;
+    }
+    Pair<Schema, Schema> schemaPair = Pair.of(left, right);
+    if (!SCHEMA_COMPARE_MAP.containsKey(schemaPair)) {

Review Comment:
   You're right it's O(1), but i'm not talking about asymptotic complexity i'm 
talking about just the cost of looking up in the HashMap: you use `Pair<Schema, 
Schema>` as a key and that means that we will have to do the equality check on 
the HM entry when we look it up, and in that case it would have to compare 2 
schemas to retrieve the previous result of their comparison which obviously 
doesn't make sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to