alamb opened a new pull request, #2803: URL: https://github.com/apache/arrow-datafusion/pull/2803
# Which issue does this PR close? Part of https://github.com/apache/arrow-datafusion/pull/2778 # Rationale for this change https://github.com/apache/arrow-rs/issues/1888 in arrow added validation to `RecordBatch` if the schema's declared nullability is different than its actual nullability. This caught that the output schema calculation for joins is incorrect -- specifically, LEFT/RIGHT/FULL joins can introduce nulls even if the input schema is not nullable. For example, given the following non-null input: | a | | --- | | 1 | | b | | --- | | 2 | This query: ```sql SELECT * FROM a LEFT JOIN b ``` Produces a null on `b` (though a is non nullable if `a` is non nullable in the input) and thus `b` must be marked nullable | a | b | | --- | --- | | 1 | NULL # What changes are included in this PR? 1. Account for `NULL`s introduced in joins in output schema calculation 2. Tests # Are there any user-facing changes? I don't think so (except more correct null schema marking) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
