comphead commented on code in PR #10647: URL: https://github.com/apache/datafusion/pull/10647#discussion_r1614830679
########## datafusion/core/src/datasource/physical_plan/parquet/schema_adapter.rs: ########## @@ -20,35 +20,38 @@ use arrow_schema::{Schema, SchemaRef}; use std::fmt::Debug; use std::sync::Arc; -/// Factory of schema adapters. +/// Factory for creating [`SchemaAdapter`] /// -/// Provides means to implement custom schema adaptation. +/// This interface provides a way to implement custom schema adaptation logic +/// for ParquetExec (for example, to fill missing columns with default value +/// other than null) pub trait SchemaAdapterFactory: Debug + Send + Sync + 'static { /// Provides `SchemaAdapter` for the ParquetExec. fn create(&self, schema: SchemaRef) -> Box<dyn SchemaAdapter>; } -/// A utility which can adapt file-level record batches to a table schema which may have a schema +/// Adapt file-level [`RecordBatch`]es to a table schema, which may have a schema /// obtained from merging multiple file-level schemas. /// /// This is useful for enabling schema evolution in partitioned datasets. /// /// This has to be done in two stages. /// -/// 1. Before reading the file, we have to map projected column indexes from the table schema to -/// the file schema. +/// 1. Before reading the file, we have to map projected column indexes from the +/// table schema to the file schema. /// -/// 2. After reading a record batch we need to map the read columns back to the expected columns -/// indexes and insert null-valued columns wherever the file schema was missing a colum present -/// in the table schema. +/// 2. After reading a record batch map the read columns back to the expected +/// columns indexes and insert null-valued columns wherever the file schema was +/// missing a colum present in the table schema. Review Comment: ```suggestion /// missing a column present in the table schema. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org