notfilippo commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2370742830
> Do we actually need to track physical representation at planning time? Logical operators during logical planning should unquestionably **not** have access to the physical type information, which should exclusively be reserved to the physical planning and/or physical execution phase. Currently some limitations in datafusion's abstraction design don't allow a clear-cut distinction between the types (this I think is clear when you look on how you can call [`LogicalPlan::schema`](https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html#method.schema)). Knowing this, some care needs to be taken in order to slowly introduce the distinction, which mainly comes in the form of storing the `DataType` alongside logical values. The objective would be to eventually remove that knowledge and making it available directly through the data source (i.e. the RecordBatch) to support the run-time adaptiveness you are mentioning above and that've also mention in the proposal: > #### RecordBatches with same logical type but different physical types > Integrating `LogicalPhysicalSchema` into DataFusion's RecordBatches, streamed from one [ExecutionPlan](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html) to the other, could be an interesting approach to support the possibility of two record batches having logically equivalent schemas with different underlying physical types. This could be useful in situations where data, stored in multiple pages mapping 1:1 with RecordBatches, is encoded with different strategies based on the density of the rows and the cardinality of the values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org