Playing with TPC-H and comparing performance between cached (serialized 
in-memory tables) and uncached (DF from parquet) results in various SQL queries 
performing much worse, duration-wise.


I see some physical plans have an extra layer of shuffle/sort/merge under 
cached scenario.


I could do some filtering by key to optimize, but I'm just curious as to why 
out-of-the-box planning is more complex and slower when tables are cached to 
mem.


Thanks!

Reply via email to