Kimahriman commented on PR #793: URL: https://github.com/apache/datafusion-comet/pull/793#issuecomment-2289825677
> LGTM. I agree that is seems like a flaw in DataFusion that we cannot define the nullability correctly. Since this may come up more and more, does it make sense to just "lie" to DataFusion to tell it it's nullable even when Spark thinks it's non-nullable? Technically anything that's not a complex type, this will likely just silently already happen and be happy. The thing that actually complains is https://github.com/apache/arrow-rs/blob/master/arrow-array/src/record_batch.rs#L203 when creating the record batch. It makes sure the data types of the schema match the data types of the columns, but data type doesn't include nullability for non-complex types, but for complex types that check includes nullability. So basically top level column nullability isn't checked, but any nested or complex type will verify the nullability. Arguably is just a bug with that check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
