cshuo commented on code in PR #13896:
URL: https://github.com/apache/hudi/pull/13896#discussion_r2357644952


##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroRecordSizeEstimator.java:
##########
@@ -45,6 +49,8 @@ public long sizeEstimate(BufferedRecord<IndexedRecord> 
record) {
       return sizeOfRecord;
     }
     // do not contain size of Avro schema as the schema is reused among records
+    Schema recordSchema = record.getRecord().getSchema();
+    long sizeOfSchema = 
sizeOfSchemaMap.computeIfAbsent(recordSchema.getFields().size(), arity -> 
ObjectSizeCalculator.getObjectSize(recordSchema));

Review Comment:
   yes, for cow write path, there can be 2 schemas, incoming record do not 
contain schema fields. Regarding the performance, I think it'll not incur much 
degradation, since the size estimating is performed through sampling.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to