alamb commented on issue #14373: URL: https://github.com/apache/datafusion/issues/14373#issuecomment-2625046162
Here is my email I sent to @lmwnshn about potential projects All of these projects would be written in Rust, on a production grade open source query Engine (DataFusion). Reference: https://dl.acm.org/doi/10.1145/3626246.3653368 There is significant community interest in the features too so if done well I think it is likely there would be community interaction and the code would be accepted. ## Implement Sideways Information Passing / Dynamic Filter Pushdown in DataFusion Ticket: https://github.com/apache/datafusion/issues/7955 This project is well documented, but only partly optimizer related Students would learn: ** Expression representation, ** Extending Database Optimizer rules (pushing predicates + join restrictions) ** Benchmarking, ** Extending physical plans / Join code ** working with open source community (I think there are several people who are interested in helping this along) ** the classic "Database lifestyle" rush of making TPCH queries faster (and wondering if the optimizations apply to other workloads) ## Implement LATERAL JOINs in DataFusion Ticket: https://github.com/apache/datafusion/issues/10048 This one is less well specified, but if a group wants to work on this I can find time to help specify it more. Students would learn: ** The wonders of subqueries, and a visceral understanding of their relation to joins ** subquery decorrelation / rewrites ** extending optimizer rules ** would need: some additional subquery decorrelation optimizer code (and possibly some physical operator support) ## Implement Range Joins / ASOF joins Ticket: https://github.com/apache/datafusion/issues/318 This one has had some work and even a prototype initial implementation. However, it needs help to design / explain / evaluate the existing approach. Students would learn: * What a Range Join is, how it works, and how it could be implemented * How to specify and describe a new feature * How to work with existing code to push the feature through -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
