kosiew opened a new pull request, #15295:
URL: https://github.com/apache/datafusion/pull/15295

   ## Which issue does this PR close?
   
   - Closes #14757.
   
   ## Rationale for this change
   
   This PR introduces a `NestedStructSchemaAdapter` to improve schema evolution 
handling in DataFusion when dealing with nested struct types. Currently, schema 
evolution primarily supports flat schemas, but evolving nested structures (such 
as adding new fields to existing structs) requires special handling. This 
change ensures better compatibility and adaptability for evolving datasets.
   
   ## What changes are included in this PR?
   
   - Introduces `NestedStructSchemaAdapter` to handle schema evolution for 
nested struct fields.
   - Implements `NestedStructSchemaAdapterFactory` to determine whether the 
specialized adapter is needed based on schema characteristics.
   - Enhances `SchemaMapping` with a new constructor for improved usability.
   - Updates `schema_adapter.rs` and integrates the new adapter into the 
`datafusion_datasource` module.
   - Adds comprehensive unit tests to verify the correctness of schema 
adaptation, including nested struct evolution scenarios.
   
   ## Are these changes tested?
   
   Yes, extensive unit tests have been added to verify:
   - Proper mapping of fields, including added and missing nested struct fields.
   - Correct adaptation from flat schemas to nested schemas.
   - Validation of different adapter selection logic based on schema 
characteristics.
   
   ## Are there any user-facing changes?
   
   No breaking changes.  
   However, users working with evolving nested struct schemas will benefit from 
improved support for automatic schema adaptation. This enhances compatibility 
with sources like Parquet, where schemas may change over time.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to