neilconway opened a new pull request, #22105:
URL: https://github.com/apache/datafusion/pull/22105

   ## Which issue does this PR close?
   
   - Closes #12727.
   
   ## Rationale for this change
   
   The Substrait logical-plan consumer was discarding field nullability when 
reconstructing DataFusion schemas from Substrait struct types. Nullability 
matters because a Substrait plan may have been produced or optimized using 
non-null guarantees.
   
   This also improves DataFusion <-> Substrait round-trip fidelity: required 
fields encoded by the producer are preserved when the plan is consumed again, 
instead of being widened to nullable.
   
   ## What changes are included in this PR?
   
     - Preserve per-field nullability when converting Substrait struct types / 
`NamedStruct` schemas into DataFusion schemas.
     - Treat Substrait `Required` as non-nullable, and `Nullable`, 
`Unspecified`, or unknown nullability values as nullable.
     - Keep deprecated `UserDefinedTypeReference` non-nullable because it does 
not carry nullability metadata.
     - Enforce named-table `ReadRel` schema compatibility when the Substrait 
schema requires a field to be non-null but the resolved DataFusion table schema 
marks it nullable.
     - Extend compatibility checking recursively through nested `Struct` fields.
     - Leave `List` and `Map` child nullability compatibility as future work, 
since their child nullability is not faithfully reconstructed today.
   
   ## Are these changes tested?
   
   Yes; new tests added.
   
   ## Are there any user-facing changes?
   
   We are a bit stricter when consuming Substrait plans now, but that could 
prevent problems: for example, if a Substrait plan was produced under the 
assumption that a field `x` is non-nullable but the local DataFusion schema 
allows null values in `x`, executing the plan might produce unexpected results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to