[GitHub] [arrow-datafusion] jiacai2050 commented on issue #4040: InvalidArgumentError("Column 'COUNT(DISTINCT demo.name)[count distinct]' is declared as non-nullable but contains null values")'

GitBox Tue, 01 Nov 2022 08:33:43 -0700


jiacai2050 commented on issue #4040:
URL: 
https://github.com/apache/arrow-datafusion/issues/4040#issuecomment-1298712273


   @MachaelLee Thanks for details messages. However, the steps above cannot be 
directly executed in datafusion, it's ceresdb's job to implement the SQL 
interface.
   
   I found one simple way to reproduce this based on 
https://github.com/apache/arrow-datafusion/blob/525ac4567ad8d86ad085d8439d890b1f9e9e6bb9/datafusion-examples/examples/memtable.rs#L39
   
   Changes are below:
   ```diff
   2 files changed, 6 insertions(+), 8 deletions(-)
   datafusion-examples/examples/memtable.rs | 12 +++++-------
   datafusion/optimizer/src/optimizer.rs    |  2 +-
   
   modified   datafusion-examples/examples/memtable.rs
   @@ -36,14 +36,12 @@ async fn main() -> Result<()> {
        // Register the in-memory table containing the data
        ctx.register_table("users", Arc::new(mem_table))?;
    
   -    let dataframe = ctx.sql("SELECT * FROM users;").await?;
   +    let dataframe = ctx
   +        .sql("SELECT id,count(distinct bank_account) From users group by 
id;")
   +        .await?;
    
        timeout(Duration::from_secs(10), async move {
   -        let result = dataframe.collect().await.unwrap();
   -        let record_batch = result.get(0).unwrap();
   -
   -        assert_eq!(1, record_batch.column(0).len());
   -        dbg!(record_batch.columns());
   +        dataframe.show().await.unwrap();
        })
        .await
        .unwrap();
   @@ -57,7 +55,7 @@ fn create_memtable() -> Result<MemTable> {
    
    fn create_record_batch() -> Result<RecordBatch> {
        let id_array = UInt8Array::from(vec![1]);
   -    let account_array = UInt64Array::from(vec![9000]);
   +    let account_array = UInt64Array::from(vec![None]);
    
        Ok(RecordBatch::try_new(
            get_schema(),
   modified   datafusion/optimizer/src/optimizer.rs
   @@ -173,7 +173,7 @@ impl Optimizer {
            rules.push(Arc::new(ReduceOuterJoin::new()));
            rules.push(Arc::new(FilterPushDown::new()));
            rules.push(Arc::new(LimitPushDown::new()));
   -        rules.push(Arc::new(SingleDistinctToGroupBy::new()));
   +        // rules.push(Arc::new(SingleDistinctToGroupBy::new()));
    
            // The previous optimizations added expressions and projections,
            // that might benefit from the following rules
   ```
   Then execute it via `cargo run --example memtable`, then we will get 
following error
   ```
   thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an 
`Err` value: InvalidArgumentError("Column 'COUNT(DISTINCT 
users.bank_account)[count distinct]' is declared as non-nullable but contains 
null values")', datafusion/core/src/physical_plan/repartition.rs:178:79
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: 
ArrowError(ExternalError(Execution("Join Error: task 17 panicked")))', 
datafusion-examples/examples/memtable.rs:44:32
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] jiacai2050 commented on issue #4040: InvalidArgumentError("Column 'COUNT(DISTINCT demo.name)[count distinct]' is declared as non-nullable but contains null values")'

Reply via email to