jeffreyssmith2nd commented on code in PR #10716: URL: https://github.com/apache/datafusion/pull/10716#discussion_r1620812645
########## datafusion/core/src/datasource/schema_adapter.rs: ########## @@ -75,9 +75,16 @@ pub trait SchemaAdapter: Send + Sync { /// Creates a `SchemaMapping` that can be used to cast or map the columns /// from the file schema to the table schema. -pub trait SchemaMapper: Send + Sync { +pub trait SchemaMapper: Debug + Send + Sync { /// Adapts a `RecordBatch` to match the `table_schema` using the stored mapping and conversions. fn map_batch(&self, batch: RecordBatch) -> datafusion_common::Result<RecordBatch>; + + /// Adapts a `RecordBatch` that does not have all the columns (as defined in the schema). Review Comment: As I understand it, when `DatafusionArrowPredicate::evaluate` is called, the `RecordBatch` only contains one column. If we use the `map_batch` function, it indexes into the `RecordBatch` as if all the columns are there. For example, if we have a schema with fields: `[{name: "value", type: "Float64"},{name: "time", type: "Timestamp"}]`, then map_batch will try to index at `1` for time, but the `RecordBatch` won't actually have that index since only one column is passed in. This definitely may be the wrong way to achieve what I wanted, but the intent was to look the field up by name in the `table_schema` so that we don't have that indexing problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org