adriangb commented on code in PR #22026:
URL: https://github.com/apache/datafusion/pull/22026#discussion_r3267880296


##########
datafusion/datasource/src/table_schema.rs:
##########
@@ -149,9 +182,49 @@ impl TableSchema {
             );
             table_partition_cols.extend(partition_cols);
         }
-        let mut builder = SchemaBuilder::from(self.file_schema.as_ref());
-        builder.extend(self.table_partition_cols.iter().cloned());
-        self.table_schema = Arc::new(builder.finish());
+        self.table_schema = build_table_schema(
+            &self.file_schema,
+            self.table_partition_cols.as_ref(),
+            self.virtual_columns.as_ref(),
+        );
+        self
+    }
+
+    /// Add virtual columns to an existing TableSchema, returning a new 
instance.
+    ///
+    /// Virtual columns are produced by the file reader (e.g. a Parquet
+    /// `row_number` column) rather than being stored in the files or derived
+    /// from partition paths. Each field must carry an arrow virtual extension
+    /// type so the reader can recognize it; `ParquetOpener` forwards these
+    /// fields to 
`parquet::arrow::arrow_reader::ArrowReaderOptions::with_virtual_columns`.
+    ///
+    /// Virtual columns are appended at the end of the table schema, after any
+    /// partition columns.
+    pub fn with_virtual_columns(mut self, virtual_columns: Vec<FieldRef>) -> 
Self {
+        debug_assert!(
+            virtual_columns.iter().enumerate().all(|(i, v)| {
+                let name = v.name();
+                !self.file_schema.fields().iter().any(|f| f.name() == name)
+                    && !self.table_partition_cols.iter().any(|p| p.name() == 
name)
+                    && !self.virtual_columns.iter().any(|w| w.name() == name)
+                    && !virtual_columns[..i].iter().any(|w| w.name() == name)
+            }),
+            "virtual column name collides with an existing file, partition, or 
virtual column"
+        );
+
+        if self.virtual_columns.is_empty() {
+            self.virtual_columns = Arc::new(virtual_columns);
+        } else {
+            let existing = Arc::get_mut(&mut self.virtual_columns).expect(

Review Comment:
   I think this will *panic* if:
   
   - You make a `TableSchema`
   - Call `with_virtual_columns` to add some virtual columns
   - clone the `TableSchema`
   - Call `with_virtual_columns` again on one of the owned clones
   
   Now `with_table_partition_cols` has the same bug so maybe it's okay, but I 
do think it's an unsafe contract. It also seems like the solution is relatively 
simple: use `Arc::make_mut`. I might make a PR for  
   `with_table_partition_cols`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to