matthewmturner commented on issue #5309: URL: https://github.com/apache/arrow-datafusion/issues/5309#issuecomment-1872623993
I tried reproducing your results with Instruments but wasnt able to get to the granularity that you had that showed DFSchema as being heavy. However, I put together a flamegraph and came to similar conclusion. In the below image the blocks in purple are for my search of `DFSchema`. Of those, there was a lot of `merge` and `field_with_qualified_name` (which is often called by `merge`) - this appears to be consistent with your profiling. It also looks like all uses of DFSchema are during the optimization pass which is consistent with your observation. Based on this, and how `field_with_name` / `field_with_qualified_name` are used within merge I think I may be able to simply replace them with `has_column_with_unqualified_name` / `has_column_with_qualified_name` which return booleans. Im hoping, time permitting, to also do some memory / allocations profiling to make sure these types of change have the desired effect. <img width="1728" alt="image" src="https://github.com/apache/arrow-datafusion/assets/22136083/c9fda12a-5df7-4b12-94de-3aa09f720535"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
