zeodtr commented on issue #7698: URL: https://github.com/apache/arrow-datafusion/issues/7698#issuecomment-1818073495
In my (humble, may be wrong) opinion, DataFusion planning code may have the following problems. 1. `LogicalPlan` (and maybe other modules) do the same operation over and over again without any precomputing or caching in a single planning session. And `LogicalPlan` cannot cache anything and `match` each time it is called since it is an `enum`. In my opinion, it would be better to make it a `trait` and each concrete plan node implements the `trait`. 2. Uses `format!` as if it is performance-free (which is not when it's called tens of thousands of times). 3. Uses iterators as if it is performance-free (which is not when the number of elements is not small and the operations in the iterator are not cheap). 4. Executes operations that can be heavy before it is determined to be necessary (functional dependency case in my report). 5. Assumes the column list is short (which is not sometimes) It's just my humble opinion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
