jonmmease opened a new issue, #4828:
URL: https://github.com/apache/arrow-datafusion/issues/4828

   **Describe the bug**
   This is a MRE for a query that results in this error:
   
   ```
   InvalidArgumentError("Column 'COUNT(DISTINCT tbl.col2)[count distinct]' is 
declared as non-nullable but contains null values")
   ```
   
   The input table (`tbl`) looks like this:
   
   ```
   +------+------+
   | col1 | col2 |
   +------+------+
   | A    |      |
   | B    | b    |
   | C    | c    |
   +------+------+
   ```
   
   This is the query
   ```sql
   SELECT count(DISTINCT "col2"), min(0) AS "zero"
   FROM "tbl"
   GROUP BY "col1"
   ```
   
   The error goes away without the `min(0) AS "zero"` column expression
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   ```rust
   
   #[cfg(test)]
   mod tests {
       use std::sync::Arc;
       use datafusion::arrow::array::{ArrayRef, StringArray};
       use datafusion::arrow::datatypes::{DataType, Field, Schema, SchemaRef};
       use datafusion::arrow::record_batch::RecordBatch;
       use datafusion::arrow::util::pretty::pretty_format_batches;
       use datafusion::datasource::MemTable;
       use datafusion::prelude::SessionContext;
   
       #[tokio::test]
       async fn count_distinct_error() {
           let col1 = Arc::new(StringArray::from(
               vec!["A", "B", "C"]
           )) as ArrayRef;
           let col2 = Arc::new(StringArray::from(
               vec![None, Some("b"), Some("c")]
           )) as ArrayRef;
   
           let schema = Arc::new(Schema::new(vec![
               Field::new("col1", DataType::Utf8, true),
               Field::new("col2", DataType::Utf8, true)
           ])) as SchemaRef;
   
           let batch = RecordBatch::try_new(schema.clone(), vec![col1, 
col2]).unwrap();
           let mem_table = MemTable::try_new(schema, 
vec![vec![batch]]).unwrap();
   
           // Create context and register table
           let ctx = SessionContext::new();
           ctx.register_table("tbl", Arc::new(mem_table)).unwrap();
   
           // Print input table
           let res = ctx.sql("SELECT * from 
tbl").await.unwrap().collect().await.unwrap();
           let formatted = pretty_format_batches(res.as_slice()).unwrap();
           println!("{}", formatted);
   
           // Perform query
           let res = ctx.sql(r#"
   SELECT count(DISTINCT "col2"), min(0) AS "zero"
   FROM "tbl"
   GROUP BY "col1"
       "#).await.unwrap().collect().await.unwrap();
   
           let formatted = pretty_format_batches(res.as_slice()).unwrap();
           println!("{}", formatted);
       }
   }
   ```
   
   **Expected behavior**
   Expect the query to complete without errors
   
   **Additional context**
   This may be a duplicate of 
https://github.com/apache/arrow-datafusion/issues/4040, but the query setup 
there was more complex. If a maintainer thinks it's the same bug I'd be happy 
to move this MRE over to that issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to