jonmmease opened a new issue, #4828:
URL: https://github.com/apache/arrow-datafusion/issues/4828
**Describe the bug**
This is a MRE for a query that results in this error:
```
InvalidArgumentError("Column 'COUNT(DISTINCT tbl.col2)[count distinct]' is
declared as non-nullable but contains null values")
```
The input table (`tbl`) looks like this:
```
+------+------+
| col1 | col2 |
+------+------+
| A | |
| B | b |
| C | c |
+------+------+
```
This is the query
```sql
SELECT count(DISTINCT "col2"), min(0) AS "zero"
FROM "tbl"
GROUP BY "col1"
```
The error goes away without the `min(0) AS "zero"` column expression
**To Reproduce**
Steps to reproduce the behavior:
```rust
#[cfg(test)]
mod tests {
use std::sync::Arc;
use datafusion::arrow::array::{ArrayRef, StringArray};
use datafusion::arrow::datatypes::{DataType, Field, Schema, SchemaRef};
use datafusion::arrow::record_batch::RecordBatch;
use datafusion::arrow::util::pretty::pretty_format_batches;
use datafusion::datasource::MemTable;
use datafusion::prelude::SessionContext;
#[tokio::test]
async fn count_distinct_error() {
let col1 = Arc::new(StringArray::from(
vec!["A", "B", "C"]
)) as ArrayRef;
let col2 = Arc::new(StringArray::from(
vec![None, Some("b"), Some("c")]
)) as ArrayRef;
let schema = Arc::new(Schema::new(vec![
Field::new("col1", DataType::Utf8, true),
Field::new("col2", DataType::Utf8, true)
])) as SchemaRef;
let batch = RecordBatch::try_new(schema.clone(), vec![col1,
col2]).unwrap();
let mem_table = MemTable::try_new(schema,
vec![vec![batch]]).unwrap();
// Create context and register table
let ctx = SessionContext::new();
ctx.register_table("tbl", Arc::new(mem_table)).unwrap();
// Print input table
let res = ctx.sql("SELECT * from
tbl").await.unwrap().collect().await.unwrap();
let formatted = pretty_format_batches(res.as_slice()).unwrap();
println!("{}", formatted);
// Perform query
let res = ctx.sql(r#"
SELECT count(DISTINCT "col2"), min(0) AS "zero"
FROM "tbl"
GROUP BY "col1"
"#).await.unwrap().collect().await.unwrap();
let formatted = pretty_format_batches(res.as_slice()).unwrap();
println!("{}", formatted);
}
}
```
**Expected behavior**
Expect the query to complete without errors
**Additional context**
This may be a duplicate of
https://github.com/apache/arrow-datafusion/issues/4040, but the query setup
there was more complex. If a maintainer thinks it's the same bug I'd be happy
to move this MRE over to that issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]