darmie commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3941326780
> One other direction I am exploring is to see if morsel-driven execution can help here. > > One hypothesis is that filter pushdown pushes more CPU work (especially in the case of dynamic queries) and serial IO (i.e. each individual RowFilter) + some additional overhead so slow / skewed partitions will become even more slow. > > With morsel-driven execution we might be able to mitigate this effect, as we can distribute the work better by planning the work using a queue (and so any overhead or file IO latencies will be spread out more). > > PoC is here [#20477](https://github.com/apache/datafusion/pull/20477) - it seems it gives quite a bit of speedups on Clickbench(!) (without filter pushdown) though I see some large slowdowns on TPCH SF10 as well, probably as it doesn't benefit much (as far as I remember data / filters are perfectly distributed and files seem to contain many row groups) and probably hurts locality as implemented. Complementary to solving the scheduling problem using Morsel driven approach is to use JIT native code to solve per-morsel execution cost. Cache locality loss from morsel reassignment can be resolved via JIT code cache. For example, if the compiled decoder for `(DICT, Int32, bit_width=12, filter=neq_empty)` is cached and reused across morsels regardless of which thread runs them, the code stays hot even when the data moves between threads. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
