berkaysynnada commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2083998118
########## datafusion/datasource/src/file_stream.rs: ########## @@ -367,7 +368,7 @@ impl Default for OnError { pub trait FileOpener: Unpin + Send + Sync { /// Asynchronously open the specified file and return a stream /// of [`RecordBatch`] - fn open(&self, file_meta: FileMeta) -> Result<FileOpenFuture>; + fn open(&self, file_meta: FileMeta, file: PartitionedFile) -> Result<FileOpenFuture>; Review Comment: > Maybe? But I feel like we have the partitioned file we might as well pass it in. Maybe we use it in the future to enable optimizations that use the partition values (eg late pruning based on partition values, including partition values in the scan so that [more filters can be evaluated](https://github.com/apache/datafusion/pull/15935), etc) I believe these can also be inferred from statistics in a more generalized fashion(don't know partition columns exist in column_statistics now) but not a big deal, we can keep this 👍🏻 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org