Just to clarify: the read path described is all about RT views here only, not related to RO.
On July 22, 2023 8:14:09 PM UTC, Nicolas Paris <nicolas.pa...@riseup.net> wrote: >I have been playing with the starrocks MOR hudi reader recently and it does an >amazing work: it has two read paths: > >1. For partitions with log files, use the merging logic >2. For partitions with only parquet files, use the cow read logic > >As you know, the first path is slow bcoz it has merging overhead and can't >provide any parquet benefit (pushdown, blooms...). In contrast, the second >path is blazing fast. > >MOR comes with tons of compaction rules, and having such behavior makes >possible hot/cold partition management. > >One particular case is GDPR where usually old records are deleted/masked on a >random distribution , while new partitions are free of changes. > >So far spark does not make distinction between log / log free partitions and I >suspect adding such improvement would make MOR table more performant. > >I would be glad to work on such feature so please give early feedback if there >is some blocker.