Re: [PR] Improve `ParquetExec` and related documentation [datafusion]

via GitHub Sat, 25 May 2024 12:24:52 -0700


comphead commented on code in PR #10647:
URL: https://github.com/apache/datafusion/pull/10647#discussion_r1614830679



##########
datafusion/core/src/datasource/physical_plan/parquet/schema_adapter.rs:
##########
@@ -20,35 +20,38 @@ use arrow_schema::{Schema, SchemaRef};
 use std::fmt::Debug;
 use std::sync::Arc;
 
-/// Factory of schema adapters.
+/// Factory for creating [`SchemaAdapter`]
 ///
-/// Provides means to implement custom schema adaptation.
+/// This interface provides a way to implement custom schema adaptation logic
+/// for ParquetExec (for example, to fill missing columns with default value
+/// other than null)
 pub trait SchemaAdapterFactory: Debug + Send + Sync + 'static {
     /// Provides `SchemaAdapter` for the ParquetExec.
     fn create(&self, schema: SchemaRef) -> Box<dyn SchemaAdapter>;
 }
 
-/// A utility which can adapt file-level record batches to a table schema 
which may have a schema
+/// Adapt file-level [`RecordBatch`]es to a table schema, which may have a 
schema
 /// obtained from merging multiple file-level schemas.
 ///
 /// This is useful for enabling schema evolution in partitioned datasets.
 ///
 /// This has to be done in two stages.
 ///
-/// 1. Before reading the file, we have to map projected column indexes from 
the table schema to
-///    the file schema.
+/// 1. Before reading the file, we have to map projected column indexes from 
the
+///    table schema to the file schema.
 ///
-/// 2. After reading a record batch we need to map the read columns back to 
the expected columns
-///    indexes and insert null-valued columns wherever the file schema was 
missing a colum present
-///    in the table schema.
+/// 2. After reading a record batch map the read columns back to the expected
+///    columns indexes and insert null-valued columns wherever the file schema 
was
+///    missing a colum present in the table schema.

Review Comment:
   ```suggestion
   ///    missing a column present in the table schema.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Improve `ParquetExec` and related documentation [datafusion]

Reply via email to