alamb commented on code in PR #14685:
URL: https://github.com/apache/datafusion/pull/14685#discussion_r1971726551
##########
datafusion/core/src/datasource/physical_plan/file_scan_config.rs:
##########
@@ -345,6 +345,32 @@ impl FileScanConfig {
/// Set the projection of the files
pub fn with_projection(mut self, projection: Option<Vec<usize>>) -> Self {
self.projection = projection;
+ self.with_updated_statistics()
+ }
+
+ // Update source statistics with the current projection data
+ fn with_updated_statistics(mut self) -> Self {
+ let max_projection_column = *self
+ .projection
+ .as_ref()
+ .and_then(|proj| proj.iter().max())
+ .unwrap_or(&0);
+
+ if max_projection_column
+ >= self.file_schema.fields().len() +
self.table_partition_cols.len()
+ {
+ // we don't yet have enough information (file schema info or
partition column info) to perform projection
+ return self;
+ }
+
+ let (
+ _projected_schema,
+ _constraints,
+ projected_statistics,
+ _projected_output_ordering,
+ ) = self.project();
+
+ self.source = self.source.with_statistics(projected_statistics);
Review Comment:
I don't fully understand why the source would need projected statistics
I am testing out if the issue is that the FileScanConfig is providing the
wrong statistics (like maybe this line should be self.statistics rather than
self.source.statistics
https://github.com/apache/datafusion/blob/1c54b38e4a4012fd8d1b4f48e2c3d6d35016bad0/datafusion/core/src/datasource/physical_plan/file_scan_config.rs#L233-L232
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]