alamb commented on code in PR #18178:
URL: https://github.com/apache/datafusion/pull/18178#discussion_r2447814803


##########
datafusion/common/src/dfschema.rs:
##########
@@ -1323,6 +1323,49 @@ impl SchemaExt for Schema {
     }
 }
 
+/// Helper to hold table schema information.
+///
+/// A table schema consists of:
+/// - file schema: the schema of the data files
+/// - table partition columns: the columns used for partitioning the table
+///
+/// This struct also holds a full table schema to be able to cheaply hand out
+/// references to any one of the representations without needing to 
reconstruct them.

Review Comment:
   ```suggestion
   /// Helper to hold table schema information.
   ///
   /// A table schema consists of:
   /// - file schema: the schema stored in the actual data file
   /// - table partition columns: partitioning columns that are part of the 
table schema
   ///   For example a listing table stored in `/data/date=2025-10-10/raw.csv`) 
would have 
   ///   a `date` partitioning column.
   /// - table schema: the schema of the target table. 
   ```



##########
datafusion/common/src/dfschema.rs:
##########
@@ -1323,6 +1323,49 @@ impl SchemaExt for Schema {
     }
 }
 
+/// Helper to hold table schema information.
+///
+/// A table schema consists of:
+/// - file schema: the schema of the data files
+/// - table partition columns: the columns used for partitioning the table
+///
+/// This struct also holds a full table schema to be able to cheaply hand out
+/// references to any one of the representations without needing to 
reconstruct them.
+#[derive(Debug, Clone)]

Review Comment:
   This is a good structure 
   
   It seems pretty specific to the DataSource code, so I suggest putting it in 
the datasource module. Maybe alongside `FileScanConfig` in  
`datafusion/datasource/src/file_scan_config.rs` or its own module 
`datafusion/datasource/src/table_schema.rs`



##########
datafusion/common/src/dfschema.rs:
##########
@@ -1323,6 +1323,49 @@ impl SchemaExt for Schema {
     }
 }
 
+/// Helper to hold table schema information.
+///
+/// A table schema consists of:
+/// - file schema: the schema of the data files
+/// - table partition columns: the columns used for partitioning the table
+///
+/// This struct also holds a full table schema to be able to cheaply hand out
+/// references to any one of the representations without needing to 
reconstruct them.
+#[derive(Debug, Clone)]
+pub struct TableSchema {
+    file_schema: SchemaRef,
+    table_partition_cols: Vec<FieldRef>,
+    table_schema: SchemaRef,

Review Comment:
   I suggest moving this content from below.
   
   ALso, I changed `file_schema` to `table_schema` in this comment as I think 
that is more accurate, but I am nore sure
   ```suggestion
       /// Note that the table schema is **not** the schema of the physical 
files.
       /// This is the schema that the physical file schema will be mapped onto,
       /// and the schema that the [`DataSourceExec`] will return.
       table_schema: SchemaRef,
   ```



##########
datafusion/common/src/dfschema.rs:
##########
@@ -1323,6 +1323,49 @@ impl SchemaExt for Schema {
     }
 }
 
+/// Helper to hold table schema information.
+///
+/// A table schema consists of:
+/// - file schema: the schema of the data files
+/// - table partition columns: the columns used for partitioning the table
+///
+/// This struct also holds a full table schema to be able to cheaply hand out
+/// references to any one of the representations without needing to 
reconstruct them.
+#[derive(Debug, Clone)]
+pub struct TableSchema {
+    file_schema: SchemaRef,

Review Comment:
   ```suggestion
       /// The file schema is the schema before any `projection` is applied. It 
contains
       /// all columns that may appear in the files, but does not include table
       /// partition columns.
       file_schema: SchemaRef,
   ```



##########
datafusion/datasource/src/file_scan_config.rs:
##########
@@ -153,15 +153,19 @@ pub struct FileScanConfig {
     /// [`RuntimeEnv::register_object_store`]: 
datafusion_execution::runtime_env::RuntimeEnv::register_object_store
     /// [`RuntimeEnv::object_store`]: 
datafusion_execution::runtime_env::RuntimeEnv::object_store
     pub object_store_url: ObjectStoreUrl,
-    /// Schema before `projection` is applied. It contains the all columns 
that may
-    /// appear in the files. It does not include table partition columns
-    /// that may be added.
-    /// Note that this is **not** the schema of the physical files.
-    /// This is the schema that the physical file schema will be
-    /// mapped onto, and the schema that the [`DataSourceExec`] will return.
+    /// Schema information including the file schema, table partition columns,
+    /// and the combined table schema.
+    ///
+    /// The file schema is the schema before `projection` is applied. It 
contains
+    /// all columns that may appear in the files, but does not include table
+    /// partition columns.
+    ///
+    /// Note that the file schema is **not** the schema of the physical files.
+    /// This is the schema that the physical file schema will be mapped onto,
+    /// and the schema that the [`DataSourceExec`] will return.

Review Comment:
   As above I recommend putting these comments on TableSchema itself
   
   
   
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to