Yuwei Xiao created HUDI-4753:
--------------------------------
Summary: More accurate evaluation of log record during log writing
or compaction
Key: HUDI-4753
URL: https://issues.apache.org/jira/browse/HUDI-4753
Project: Apache Hudi
Issue Type: Bug
Reporter: Yuwei Xiao
In current log writing, the avgRecordSize is taken from the first incoming log
record, which may not be accurate, especially in metadata table case.
In metadata table writing, the first log record is always `__all_partition__`,
which may be much larger than a normal partition record.
The issue will case performance issue in log writing and compaction, as we need
to write too many log blocks and spill unnecessary record to disk.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)