Brijesh-Thakkar commented on PR #22640:
URL: https://github.com/apache/datafusion/pull/22640#issuecomment-4599035139

   @adriangb There are two reasons why comparing the structure of something is 
better than checking if two schemasre exactly the same.
   
   1. The way DFSchema checks for equality includes information.
   
   Since it checks every part of the schema including information about the 
schema itself and the fields it will say they are different if anything is 
different even if it is not important.
   
   If we just checked if the schema was the same as the input schema and the 
input had any differences in the information we would incorrectly go down a 
slow path.
   
   This slow path would do a lot of work like trying to create a new Union and 
merging all the extra information for every part of the optimization process.
   
   The structural comparison is done on purpose so that if the structure is the 
same but the extra information is different it stays on the path.
   
   2. The fast path is there to avoid doing work when it is not needed.
   
   The function to calculate the schema again is called for every part of the 
plan tree like when we change the type of something or remove parts.
   
   Most of the time the inputs to the Union nodes will not have changed.
   
   The structural comparison is simple. Does not use any extra memory.
   
   However trying to create an Union always uses extra memory even if nothing 
has changed.
   
   The extra work of using memory for every part of the plan tree is what the 
cache is trying to prevent.
   
   So to answer your question directly: what we are saving is the work of using 
memory for every time we calculate the schema again for any Union node that has 
not changed which is the common case, for every optimization pass that does not 
touch that specific node.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to