jtuglu1 commented on issue #19456:
URL: https://github.com/apache/druid/issues/19456#issuecomment-4485878587

   > The leaf paths— the ones that read Segment— may require some additional 
care to avoid materializing things too early.
   
   This is a pretty well-known issue in Datafusion where you need to implement 
a lot of custom pushdown (e.g. physical operators to read/filter on segment 
bitmaps/dictionaries) yourself to avoid expensive re-materializing between 
operators. This will be especially painful if we need to cross the JVM/Rust 
boundary frequently.
   
   Another thing I want to call out here is starting to move towards 
deprecating and unifying the available engines in Druid. Currently I see a lot 
of work being done on various engines (dart, msq, native, etc.) and what I 
think we should be aiming towards is the native processing core to be built out 
on whatever the decided "next" gen engine is (and its operator interfaces). 
While I think the native segment readers are a bit more generic and can be 
shared, this kind of thing will help motivate a deprecation of older execution 
paths.
   
   @gianm I think there's a lot of value in considering adoption of 
Datafusion's planner/optimizer/physical operator ecosystem in the long term; 
while I think there will need to be many Druid-specific operator overrides, to 
me it seems like a net-positive if we're able to get statistics 
[support](https://docs.rs/datafusion/latest/datafusion/common/struct.ColumnStatistics.html),
 advanced planning (better join algos, etc.), and optimized physical operators 
(spilling grouping, etc.) out-of-the-box without needing to worry about 
implementing these ourself. That being said, I think distributed Datafusion is 
still in a nascent phase (Ballista, etc.), so this is definitely a longer-term 
thing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to