Yuwei Xiao created HUDI-4753:
--------------------------------

             Summary: More accurate evaluation of log record during log writing 
or compaction
                 Key: HUDI-4753
                 URL: https://issues.apache.org/jira/browse/HUDI-4753
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Yuwei Xiao


In current log writing, the avgRecordSize is taken from the first incoming log 
record, which may not be accurate, especially in metadata table case.

 

In metadata table writing, the first log record is always `__all_partition__`, 
which may be much larger than a normal partition record.

 

The issue will case performance issue in log writing and compaction, as we need 
to write too many log blocks and spill unnecessary record to disk.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to