Brijesh-Thakkar commented on PR #22640: URL: https://github.com/apache/datafusion/pull/22640#issuecomment-4599035139
@adriangb There are two reasons why comparing the structure of something is better than checking if two schemasre exactly the same. 1. The way DFSchema checks for equality includes information. Since it checks every part of the schema including information about the schema itself and the fields it will say they are different if anything is different even if it is not important. If we just checked if the schema was the same as the input schema and the input had any differences in the information we would incorrectly go down a slow path. This slow path would do a lot of work like trying to create a new Union and merging all the extra information for every part of the optimization process. The structural comparison is done on purpose so that if the structure is the same but the extra information is different it stays on the path. 2. The fast path is there to avoid doing work when it is not needed. The function to calculate the schema again is called for every part of the plan tree like when we change the type of something or remove parts. Most of the time the inputs to the Union nodes will not have changed. The structural comparison is simple. Does not use any extra memory. However trying to create an Union always uses extra memory even if nothing has changed. The extra work of using memory for every part of the plan tree is what the cache is trying to prevent. So to answer your question directly: what we are saving is the work of using memory for every time we calculate the schema again for any Union node that has not changed which is the common case, for every optimization pass that does not touch that specific node. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
