saadtajwar commented on issue #23263: URL: https://github.com/apache/datafusion/issues/23263#issuecomment-4860899433
Hmm okay, @RatulDawar looking at this now - going to have to ask a few newbie questions haha: - Am I correct in my understanding in that he problem is, for this query, DataFusion at the very start of physical execution (in the `DataSourceExec` node) immediately does projection of all 105 columns in the table - however projection can be deferred until much later, because only really need the `URL` column for the filter & `EventTime` for the sort & limit, and we DataFusion to _after_ all of those expensive computations perform the projection of all columns? - If I'm understanding the above correctly, your proposal is to start by just deferring the projection until after the limit (so moving `ProjectionExec` to above `SortExec`)? Please let me know if there's anything I'm missing - happy to start helping you out with implementation/continue digging if needed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
