notfilippo commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2373964785
> I'm a bit confused as to what the goal is of this work is if we still need to track the physical type during planning? I would like to stress that the intent of this proposal remains to decouple logical types from physical types in order to achieve the following goal: > Logical operators during logical planning should unquestionably not have access to the physical type information, which should exclusively be reserved to the physical planning and physical execution. > > LogicalPlans will use LogicalType while PhysicalPlans will use DataType. While the goal seems to have achieved wide consensus, the path to reach it has not been finalised. Through some experiments (#11160 -> #11978 -> #12536) we've been trying to narrow down on a possible approach to commit to in order to make progress. As this proposal aims at _changing the tires on a moving car_ there is and there will be a lot of discussion in order to complete the migration safely and without breaking too much functionality for end user. This will certainly result in a intermediate state where the existing behaviour is supported by temporarily tracking `DataType` alongside some objects which will only have a logical type until the type can be extracted by the context itself. --- Re: @findepi's proposal, > Summing up, I propose that > > * we introduce the concept of "data fusion type". This is the "logical type" @notfilippo proposed. > * we use this "data fusion type" for logical plans > * we use this "data fusion type" for physical plans as well > * this leaves existing "physical plans" to be a runtime concept > * we use this "data fusion type" for function authoring, scalar/constant/literal values in the plan This proposal is compatible with (and actually depends on) the decoupling logical from physical types but I think it's a further step ahead to consider once we at least clear the initial steps to take in order to make LogicalTypes happen. Additionally I think it should be filed as a separate, but related, ticket. I understand that it heavily depends and influences the choices of this proposal but judging by the comments above I think there needs to be a separate discussion in order to validate the idea on its own. --- > I think another benefit of the current type system is that the implementations of functions (and operators, etc) declare what types of arrays (physical encodings) they have specializations for and then the optimizers and analyzers ensure that the types lineup and try to minimize conversions at runtime Not sure where we discussed this already but I would love to support both logical types and physical types when declaring function signatures in order to let the user have full control over the arguments, as little as a LogicalType + cast of as much as precise function for specific DataTypes. Instead I was planning on keeping `return_type` and `invoke` as is, potentially adding a `return_logical_type` helper if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org