AssHero commented on code in PR #2702: URL: https://github.com/apache/arrow-datafusion/pull/2702#discussion_r893003362
########## datafusion/core/tests/sql/joins.rs: ########## @@ -1204,3 +1204,141 @@ async fn join_partitioned() -> Result<()> { Ok(()) } + +#[tokio::test] +async fn join_with_hash_unsupported_data_type() -> Result<()> { + let ctx = SessionContext::new(); + + let schema = Schema::new(vec![ + Field::new("c1", DataType::Int32, true), + Field::new("c2", DataType::Utf8, true), + Field::new("c3", DataType::Int64, true), + Field::new("c4", DataType::Date32, true), + ]); + let data = RecordBatch::try_new( + Arc::new(schema), + vec![ + Arc::new(Int32Array::from_slice(&[1, 2, 3])), + Arc::new(StringArray::from_slice(&["aaa", "bbb", "ccc"])), + Arc::new(Int64Array::from_slice(&[100, 200, 300])), + Arc::new(Date32Array::from(vec![Some(1), Some(2), Some(3)])), + ], + )?; + let table = MemTable::try_new(data.schema(), vec![vec![data]])?; + ctx.register_table("foo", Arc::new(table))?; + + // join on hash unsupported data type (Date32), use cross join instead hash join + let sql = "select * from foo t1 join foo t2 on t1.c4 = t2.c4"; + let msg = format!("Creating logical plan for '{}'", sql); Review Comment: > So I think CrossJoin is almost never what the user would want: as once the tables get beyond any trivial size the query will effectively never finish or will run out of memory. An error is clearer. > > From the issue description [#2145 (comment)](https://github.com/apache/arrow-datafusion/issues/2145#issue-1191065060) I think @pjmore's idea to cast unsupported types to a supported type is a good one -- the arrow `cast` kernels are quite efficient for things like `Date32` -> `Int32` (no copies) as the representations are the same > > @pjmore what do you think? this pr is only to make the hash unsupported join running in cross join instead of error/panic(I think it is not friendly in database system). supporting more data types in hash join is the better way to solve this issue, and i'm already working on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org