NGA-TRAN opened a new issue, #18259: URL: https://github.com/apache/datafusion/issues/18259
This task is part of epic #18249 ## Cost Model Do we want to pursue a traditional, complex cost model that estimates all work in a plan, or take a simpler approach? In practice, even the most detailed cost models often prove inaccurate despite significant effort. Do we want to adopt a more intuitive approach—similar to the join ranking strategy. Consider the following examples: - Is a plan with two merge joins better than one merge join and one hash join? How should we assign weights and make comparisons? - Is a merge join on a single stream consistently faster than a partitioned hash join across multiple streams? How do we evaluate and rank these scenarios? - Instead of using exact byte sizes, could we categorize input sizes as small, medium, or large and assign weights accordingly? This could serve as a quick and practical research project: define relevant properties, assign weights and criteria, and run simple experiments to compare estimated costs against actual runtimes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
