danny0405 commented on code in PR #18843:
URL: https://github.com/apache/hudi/pull/18843#discussion_r3308110903


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -247,7 +248,11 @@ private void init(HoodieRecord record) {
     deltaWriteStat.setPartitionPath(partitionPath);
     deltaWriteStat.setFileId(fileId);
     Option<FileSlice> fileSliceOpt = 
populateWriteStatAndFetchFileSlice(record, deltaWriteStat);
-    averageRecordSize = sizeEstimator.sizeEstimate(record);
+    // averageRecordSize is seeded lazily in flushToDiskIfRequired on the 
first buffered
+    // (post-prepareRecord) record. Sizing the incoming record here 
under-counts heap
+    // because recordList retains the post-prepareRecord clone: a 
fully-materialized Avro
+    // IndexedRecord with prepended meta-fields, whereas the incoming record's 
payload

Review Comment:
   we can try to serialize the IndexedRecord into avro bytes within the hoodie 
record after the meta-fields prepend so that we keep the records in-memory 
compact and reduce the gap between the size of the in-memory records and the 
actual serialized log block.
   
   We did the similar thing in spillable map to reduce the spills and here it 
is also suitable for this buffering, so that we reduce the number of small log 
blocks to gain perfs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to