alamb commented on issue #14373:
URL: https://github.com/apache/datafusion/issues/14373#issuecomment-2625046162

   Here is my email I sent to @lmwnshn  about potential projects
   
   All of these projects would be written in Rust, on a production grade open 
source query Engine (DataFusion).
   
   Reference: https://dl.acm.org/doi/10.1145/3626246.3653368
   
   There is significant community interest in the features too so if done well 
I think it is likely there would be community interaction and the code would be 
accepted.
   
   ## Implement Sideways Information Passing / Dynamic Filter Pushdown in 
DataFusion
   
   Ticket: https://github.com/apache/datafusion/issues/7955
   
   This project is well documented, but only partly optimizer related
   
   Students would learn:
   ** Expression representation,
   ** Extending Database Optimizer rules (pushing predicates + join 
restrictions)
   ** Benchmarking,
   ** Extending physical plans / Join code
   ** working with open source community (I think there are several people who 
are interested in helping this along)
   ** the classic "Database lifestyle" rush of making TPCH queries faster (and 
wondering if the optimizations apply to other workloads)
   
   
   ## Implement LATERAL JOINs in DataFusion
   
   Ticket: https://github.com/apache/datafusion/issues/10048
   
   This one is less well specified, but if a group wants to work on this I can 
find time to help specify it more.
   
   Students would learn:
   ** The wonders of subqueries, and a visceral understanding of their relation 
to joins
   ** subquery decorrelation / rewrites
   ** extending optimizer rules
   ** would need: some additional subquery decorrelation optimizer code (and 
possibly some physical operator support)
   
   
   ## Implement Range Joins / ASOF joins
   
   Ticket: https://github.com/apache/datafusion/issues/318
   
   This one has had some work and even a prototype initial implementation. 
However, it needs help to design / explain / evaluate the existing approach.
   
   Students would learn:
   * What a Range Join is, how it works, and how it could be implemented
   * How to specify and describe a new feature
   * How to work with existing code to push the feature through


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to