jiacai2050 commented on issue #4040: URL: https://github.com/apache/arrow-datafusion/issues/4040#issuecomment-1298712273
@MachaelLee Thanks for details messages. However, the steps above cannot be directly executed in datafusion, it's ceresdb's job to implement the SQL interface. I found one simple way to reproduce this based on https://github.com/apache/arrow-datafusion/blob/525ac4567ad8d86ad085d8439d890b1f9e9e6bb9/datafusion-examples/examples/memtable.rs#L39 Changes are below: ```diff 2 files changed, 6 insertions(+), 8 deletions(-) datafusion-examples/examples/memtable.rs | 12 +++++------- datafusion/optimizer/src/optimizer.rs | 2 +- modified datafusion-examples/examples/memtable.rs @@ -36,14 +36,12 @@ async fn main() -> Result<()> { // Register the in-memory table containing the data ctx.register_table("users", Arc::new(mem_table))?; - let dataframe = ctx.sql("SELECT * FROM users;").await?; + let dataframe = ctx + .sql("SELECT id,count(distinct bank_account) From users group by id;") + .await?; timeout(Duration::from_secs(10), async move { - let result = dataframe.collect().await.unwrap(); - let record_batch = result.get(0).unwrap(); - - assert_eq!(1, record_batch.column(0).len()); - dbg!(record_batch.columns()); + dataframe.show().await.unwrap(); }) .await .unwrap(); @@ -57,7 +55,7 @@ fn create_memtable() -> Result<MemTable> { fn create_record_batch() -> Result<RecordBatch> { let id_array = UInt8Array::from(vec![1]); - let account_array = UInt64Array::from(vec![9000]); + let account_array = UInt64Array::from(vec![None]); Ok(RecordBatch::try_new( get_schema(), modified datafusion/optimizer/src/optimizer.rs @@ -173,7 +173,7 @@ impl Optimizer { rules.push(Arc::new(ReduceOuterJoin::new())); rules.push(Arc::new(FilterPushDown::new())); rules.push(Arc::new(LimitPushDown::new())); - rules.push(Arc::new(SingleDistinctToGroupBy::new())); + // rules.push(Arc::new(SingleDistinctToGroupBy::new())); // The previous optimizations added expressions and projections, // that might benefit from the following rules ``` Then execute it via `cargo run --example memtable`, then we will get following error ``` thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Column 'COUNT(DISTINCT users.bank_account)[count distinct]' is declared as non-nullable but contains null values")', datafusion/core/src/physical_plan/repartition.rs:178:79 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(ExternalError(Execution("Join Error: task 17 panicked")))', datafusion-examples/examples/memtable.rs:44:32 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
