Re: Improved MOR spark reader

Nicolas Paris Sat, 22 Jul 2023 13:23:57 -0700

Just to clarify: the read path described is all about RT views here only, not 
related to RO.


On July 22, 2023 8:14:09 PM UTC, Nicolas Paris <nicolas.pa...@riseup.net> wrote:
>I have been playing with the starrocks MOR hudi reader recently and it does an 
>amazing work: it has two read paths:
>
>1. For partitions with log files, use the merging logic
>2. For partitions with only parquet files, use the cow read logic
>
>As you know, the first path is slow bcoz it has merging overhead and can't 
>provide any parquet benefit (pushdown, blooms...). In contrast, the second 
>path is blazing fast.
>
>MOR comes with tons of compaction rules, and  having such behavior makes 
>possible hot/cold partition management.
>
>One particular case is GDPR where usually old records are deleted/masked on a 
>random distribution , while new partitions are free of changes.
>
>So far spark does not make distinction between log / log free partitions and I 
>suspect adding such improvement would make MOR table more performant.
>
>I would be glad to work on such feature so please give early feedback if there 
>is some blocker.

Re: Improved MOR spark reader

Reply via email to