Re: [I] Make a faster way to check column existence in optimizer (not `is_err()`) [arrow-datafusion]

via GitHub Sat, 30 Dec 2023 15:14:40 -0800


matthewmturner commented on issue #5309:
URL: 
https://github.com/apache/arrow-datafusion/issues/5309#issuecomment-1872623993


   I tried reproducing your results with Instruments but wasnt able to get to 
the granularity that you had that showed DFSchema as being heavy.
   
   However, I put together a flamegraph and came to similar conclusion.  In the 
below image the blocks in purple are for my search of `DFSchema`.  Of those, 
there was a lot of `merge` and `field_with_qualified_name` (which is often 
called by `merge`) - this appears to be consistent with your profiling.  It 
also looks like all uses of DFSchema are during the optimization pass which is 
consistent with your observation.
   
   Based on this, and how `field_with_name` / `field_with_qualified_name` are 
used within merge I think I may be able to simply replace them with 
`has_column_with_unqualified_name` / `has_column_with_qualified_name` which 
return booleans.
   
   Im hoping, time permitting, to also do some memory / allocations profiling 
to make sure these types of change have the desired effect.
   
   <img width="1728" alt="image" 
src="https://github.com/apache/arrow-datafusion/assets/22136083/c9fda12a-5df7-4b12-94de-3aa09f720535";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Make a faster way to check column existence in optimizer (not `is_err()`) [arrow-datafusion]

Reply via email to