alamb commented on code in PR #2802:
URL: https://github.com/apache/arrow-datafusion/pull/2802#discussion_r907870656


##########
datafusion/core/src/physical_optimizer/aggregate_statistics.rs:
##########
@@ -276,8 +276,8 @@ mod tests {
     /// Mock data using a MemoryExec which has an exact count statistic
     fn mock_data() -> Result<Arc<MemoryExec>> {
         let schema = Arc::new(Schema::new(vec![
-            Field::new("a", DataType::Int32, false),
-            Field::new("b", DataType::Int32, false),
+            Field::new("a", DataType::Int32, true),
+            Field::new("b", DataType::Int32, true),

Review Comment:
   This is a pretty easy to understand example of the issue -- prior to this 
PR, the fields `"a"` and `"b"` are declared as `"nullable=false"` but then 5 
lines lower `NULL` data is inserted 🤦 
   
   
   ```rust
           let batch = RecordBatch::try_new(
               Arc::clone(&schema),
               vec![
                   Arc::new(Int32Array::from(vec![Some(1), Some(2), None])),
                   Arc::new(Int32Array::from(vec![Some(4), None, Some(6)])),
               ],
           )?;
   ```
   
   Now that `RecordBatch::try_new` validates the nullability, the schema must 
match the data otherwise an error results



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to