2010YOUY01 commented on issue #21120: URL: https://github.com/apache/datafusion/issues/21120#issuecomment-4114949587
This work looks really exciting, thank you for making it happen! One future challenge I see is that the planned work follows a bottom-up approach: we first build the infrastructure, then evolve the optimizer. The issue is that some low-level design decisions (e.g., using algorithm X to estimate NDV for expression Y) can be difficult to reason about, unless the reviewer is a CBO expert. Perhaps we could also provide a top-down write-up, starting from the end goal (making certain workloads faster) and working step by step down to the local algorithm choices. A TLDR with references would likely help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
