mbutrovich commented on code in PR #22026:
URL: https://github.com/apache/datafusion/pull/22026#discussion_r3319392368


##########
datafusion/datasource/src/table_schema.rs:
##########
@@ -140,14 +161,30 @@ impl TableSchema {
     }
 
     /// Return a new `TableSchema` with `partition_cols` as its partition 
columns,
-    /// replacing any existing ones.
+    /// replacing any existing ones. Existing virtual columns are preserved.
     #[deprecated(
         since = "55.0.0",
         note = "use 
TableSchema::builder(file_schema).with_table_partition_cols(cols).build()"
     )]
     pub fn with_table_partition_cols(self, partition_cols: Vec<FieldRef>) -> 
Self {
         TableSchemaBuilder::new(self.file_schema)
             .with_table_partition_cols(partition_cols)
+            .with_virtual_columns(self.virtual_columns)
+            .build()
+    }
+
+    /// Return a new `TableSchema` with `virtual_columns` as its virtual 
columns,
+    /// replacing any existing ones. Existing partition columns are preserved.
+    ///
+    /// Virtual columns are produced by the file reader (e.g. a Parquet
+    /// `row_number` column) rather than stored in the files or derived from
+    /// partition paths. Each field must carry an arrow virtual extension type 
so
+    /// the reader can recognize it; `ParquetOpener` forwards these fields to
+    /// 
`parquet::arrow::arrow_reader::ArrowReaderOptions::with_virtual_columns`.
+    pub fn with_virtual_columns(self, virtual_columns: Vec<FieldRef>) -> Self {

Review Comment:
   My bad. I think I missed this when bringing it over to a fresh branch. I'll 
add a followup PR.



##########
datafusion/datasource/src/table_schema.rs:
##########
@@ -166,13 +203,43 @@ impl TableSchema {
         &self.table_partition_cols
     }
 
-    /// Get the full table schema (file schema + partition columns).
+    /// Get the virtual columns.
     ///
-    /// This is the complete schema that will be seen by queries, combining
-    /// both the columns from the files and the partition columns.
+    /// Virtual columns are produced by the file reader (e.g. Parquet
+    /// `row_number`) and are not stored in the data files or derived from
+    /// partition paths.
+    pub fn virtual_columns(&self) -> &Fields {
+        &self.virtual_columns
+    }
+
+    /// Get the full table schema (file schema + partition columns + virtual 
columns).
+    ///
+    /// This is the complete schema that will be seen by queries. Fields appear
+    /// in the order: file columns, partition columns, virtual columns.
     pub fn table_schema(&self) -> &SchemaRef {
         &self.table_schema
     }
+
+    /// Schema of columns that can be referenced by predicates pushed into the
+    /// file reader: file columns plus partition columns, excluding virtual
+    /// columns.
+    ///
+    /// Virtual columns are produced by the reader itself (e.g. Parquet
+    /// `row_number`) and cannot be referenced inside the reader's row filter,
+    /// so predicates that reference them must stay above the scan. Callers
+    /// deciding which filters to push down should check against this schema
+    /// rather than [`Self::table_schema`].
+    ///
+    /// When there are no virtual columns this returns the same schema as
+    /// [`Self::table_schema`].
+    pub fn schema_without_virtual_columns(&self) -> SchemaRef {
+        if self.virtual_columns.is_empty() {
+            return Arc::clone(&self.table_schema);
+        }
+        let mut builder = SchemaBuilder::from(self.file_schema.as_ref());
+        builder.extend(self.table_partition_cols.iter().cloned());
+        Arc::new(builder.finish())

Review Comment:
   I'll include this in the followup.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to