notfilippo commented on issue #11513:
URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2373964785

   > I'm a bit confused as to what the goal is of this work is if we still need 
to track the physical type during planning?
   
   I would like to stress that the intent of this proposal remains to decouple 
logical types from physical types in order to achieve the following goal:
   
   > Logical operators during logical planning should unquestionably not have 
access to the physical type information, which should exclusively be reserved 
to the physical planning and physical execution.
   >
   > LogicalPlans will use LogicalType while PhysicalPlans will use DataType.
   
   While the goal seems to have achieved wide consensus, the path to reach it 
has not been finalised. Through some experiments (#11160 -> #11978 -> #12536) 
we've been trying to narrow down on a possible approach to commit to in order 
to make progress.
   
   As this proposal aims at _changing the tires on a moving car_ there is and 
there will be a lot of discussion in order to complete the migration safely and 
without breaking too much functionality for end user. This will certainly 
result in a intermediate state where the existing behaviour is supported by 
temporarily tracking `DataType` alongside some objects which will only have a 
logical type until the type can be extracted by the context itself.
   
   ---
   
   Re: @findepi's proposal,
   
   > Summing up, I propose that
   > 
   > * we introduce the concept of "data fusion type". This is the "logical 
type" @notfilippo proposed.
   > * we use this "data fusion type" for logical plans
   > * we use this "data fusion type" for physical plans as well
   >   * this leaves existing "physical plans" to be a runtime concept
   > * we use this "data fusion type" for function authoring, 
scalar/constant/literal values in the plan
   
   This proposal is compatible with (and actually depends on) the decoupling 
logical from physical types but I think it's a further step ahead to consider 
once we at least clear the initial steps to take in order to make LogicalTypes 
happen. 
   
   Additionally I think it should be filed as a separate, but related, ticket. 
I understand that it heavily depends and influences the choices of this 
proposal but judging by the comments above I think there needs to be a separate 
discussion in order to validate the idea on its own.
   
   ---
   
   > I think another benefit of the current type system is that the 
implementations of functions (and operators, etc) declare what types of arrays 
(physical encodings) they have specializations for and then the optimizers and 
analyzers ensure that the types lineup and try to minimize conversions at 
runtime
   
   Not sure where we discussed this already but I would love to support both 
logical types and physical types when declaring function signatures in order to 
let the user have full control over the arguments, as little as a LogicalType + 
cast of as much as precise function for specific DataTypes.
   
   Instead I was planning on keeping `return_type` and `invoke` as is, 
potentially adding a `return_logical_type` helper if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to