Re: [I] Bad performance on wide tables (1000+ columns) [arrow-datafusion]

via GitHub Sun, 19 Nov 2023 17:31:07 -0800


zeodtr commented on issue #7698:
URL: 
https://github.com/apache/arrow-datafusion/issues/7698#issuecomment-1818073495


   In my (humble, may be wrong) opinion, DataFusion planning code may have the 
following problems.
   
   1. `LogicalPlan` (and maybe other modules) do the same operation over and 
over again without any precomputing or caching in a single planning session. 
And `LogicalPlan` cannot cache anything and `match` each time it is called 
since it is an `enum`. In my opinion, it would be better to make it a `trait` 
and each concrete plan node implements the `trait`.
   2. Uses `format!` as if it is performance-free (which is not when it's 
called tens of thousands of times).
   3. Uses iterators as if it is performance-free (which is not when the number 
of elements is not small and the operations in the iterator are not cheap).
   4. Executes operations that can be heavy before it is determined to be 
necessary (functional dependency case in my report).
   5. Assumes the column list is short (which is not sometimes)
   
   It's just my humble opinion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Bad performance on wide tables (1000+ columns) [arrow-datafusion]

Reply via email to