notfilippo commented on issue #11513:
URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2370742830

   > Do we actually need to track physical representation at planning time?
   
   Logical operators during logical planning should unquestionably **not** have 
access to the physical type information, which should exclusively be reserved 
to the physical planning and/or physical execution phase. 
   
   Currently some limitations in datafusion's abstraction design don't allow a 
clear-cut distinction between the types (this I think is clear when you look on 
how you can call 
[`LogicalPlan::schema`](https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html#method.schema)).
 Knowing this, some care needs to be taken in order to slowly introduce the 
distinction, which mainly comes in the form of storing the `DataType` alongside 
logical values. 
   
   The objective would be to eventually remove that knowledge and making it 
available directly through the data source (i.e. the RecordBatch) to support 
the run-time adaptiveness you are mentioning above and that've also mention in 
the proposal:
   
   > #### RecordBatches with same logical type but different physical types
   > Integrating `LogicalPhysicalSchema` into DataFusion's RecordBatches, 
streamed from one 
[ExecutionPlan](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html)
 to the other, could be an interesting approach to support the possibility of 
two record batches having logically equivalent schemas with different 
underlying physical types. This could be useful in situations where data, 
stored in multiple pages mapping 1:1 with RecordBatches, is encoded with 
different strategies based on the density of the rows and the cardinality of 
the values.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to