alamb commented on code in PR #20188:
URL: https://github.com/apache/datafusion/pull/20188#discussion_r2775042693
##########
datafusion/datasource/src/file_scan_config.rs:
##########
@@ -55,10 +55,21 @@ use datafusion_physical_plan::{
use log::{debug, warn};
use std::{any::Any, fmt::Debug, fmt::Formatter, fmt::Result as FmtResult,
sync::Arc};
-/// The base configurations for a [`DataSourceExec`], the a physical plan for
-/// any given file format.
+/// [`FileScanConfig`] represents scanning data from a group of files
///
-/// Use [`DataSourceExec::from_data_source`] to create a [`DataSourceExec`]
from a ``FileScanConfig`.
+/// `FileScanConfig` is used to create a [`DataSourceExec`], the physical plan
+/// for scanning files with a particular file format.
+///
+/// The [`FileSource`] (e.g. `ParquetSource`, `CsvSource`, etc.) is responsible
+/// for creating the actual execution plan to read the files based on a
+/// `FileScanConfig`. Fields in a `FileScanConfig` such as Statistics represent
+/// information about the files **before** any projection or filtering is
Review Comment:
I agree what the code currently does is not ideal. However, before I fix it
I need to understand what it currently does :)
BTW I am actually looking into the same sort of question for the
EquivalenceProperties
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]