kosiew opened a new pull request, #16148: URL: https://github.com/apache/datafusion/pull/16148
## Which issue does this PR close? This is part of a series of PRs re-implementing #15295 to close #14657 by adding schema‐evolution support for: - listing‐based tables - nested structs in DataFusion. ## Rationale for this change To enable customizable schema evolution during file scans, we[ introduce a `SchemaAdapterFactory` hook into all `FileSource` implementations](https://github.com/apache/datafusion/pull/15295#discussion_r2100959986). This allows users to adapt column mappings and perform transformations (e.g., renaming, casting, adding defaults) without forking core scan logic. ## What changes are included in this PR? - **Core API additions** - Added `with_schema_adapter_factory` and `schema_adapter_factory` methods to the `FileSource` trait - Introduced the `impl_schema_adapter_methods!()` macro to reduce boilerplate in each `FileSource` implementation - Added `as_file_source` helper to convert concrete sources into `Arc<dyn FileSource>` - **Datasource crate updates** - Updated CSV, JSON, Avro, Parquet, and Arrow `FileSource` implementations to store and honor an optional `schema_adapter_factory` - Applied the new macro and helper consistently across all `FileSource` implementations - **Testing** - Added unit tests: - `schema_adapter_factory_tests.rs` - `test_adapter_updated.rs` - `test_source_adapter_tests.rs` These cover factory wiring, column index mapping, schema transformation logic, and source behavior - Added integration tests: - `schema_adapter_integration_tests.rs` - `apply_schema_adapter_tests.rs` These validate adapter behavior in real-world scenarios such as scanning Parquet files ## Are these changes tested? Yes. This PR includes comprehensive new tests to ensure: 1. Default behavior is preserved when no schema adapter is used 2. Factories can be injected and retrieved via the new API 3. Adapters correctly map schemas and record batches 4. The system works end-to-end with real file formats like Parquet ## Are there any user-facing changes? Yes: - Public API additions to the `FileSource` trait - New macro `impl_schema_adapter_methods!()` for downstream implementors These changes are additive and backward-compatible. Developers implementing custom `FileSource` types must either use the macro or provide the new methods to support schema adapters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org