Hi Team,

Record level index uses a metadata table which is a MOR table type.

Each delta commit in the metadata table creates multiple hfile log blocks
and so to read them multiple file handles have to be opened which might
cause issues in read performance. To reduce the read performance,
compaction can be run frequently which basically merges all the log blocks
to base file and creates another version of base file. If this is done
frequently, it would cause write amplification.

Instead of merging all the log blocks to base file and doing a full
compaction, minor compaction can be done which basically merges log blocks
and creates one new log block.

This can be achieved by adding a new action to Hudi called LogCompaction
and requires a RFC. Please let me know what you think.


Thanks,

Surya

Reply via email to