alamb commented on code in PR #2702:
URL: https://github.com/apache/arrow-datafusion/pull/2702#discussion_r892808334
##########
datafusion/expr/src/utils.rs:
##########
@@ -643,6 +644,35 @@ pub fn expr_as_column_expr(expr: &Expr, plan:
&LogicalPlan) -> Result<Expr> {
}
}
+/// can this data type be used in hash join equal conditions??
+/// If more data types are supported in hash join, add those data types here
+/// to generate join logical plan.
+pub fn can_hash(data_type: &DataType) -> bool {
+ match data_type {
Review Comment:
I think it may help to add a comment here (or in `equal_rows`) mentioning
they need to remain in sync
##########
datafusion/core/tests/sql/joins.rs:
##########
@@ -1204,3 +1204,141 @@ async fn join_partitioned() -> Result<()> {
Ok(())
}
+
+#[tokio::test]
+async fn join_with_hash_unsupported_data_type() -> Result<()> {
+ let ctx = SessionContext::new();
+
+ let schema = Schema::new(vec![
+ Field::new("c1", DataType::Int32, true),
+ Field::new("c2", DataType::Utf8, true),
+ Field::new("c3", DataType::Int64, true),
+ Field::new("c4", DataType::Date32, true),
+ ]);
+ let data = RecordBatch::try_new(
+ Arc::new(schema),
+ vec![
+ Arc::new(Int32Array::from_slice(&[1, 2, 3])),
+ Arc::new(StringArray::from_slice(&["aaa", "bbb", "ccc"])),
+ Arc::new(Int64Array::from_slice(&[100, 200, 300])),
+ Arc::new(Date32Array::from(vec![Some(1), Some(2), Some(3)])),
+ ],
+ )?;
+ let table = MemTable::try_new(data.schema(), vec![vec![data]])?;
+ ctx.register_table("foo", Arc::new(table))?;
+
+ // join on hash unsupported data type (Date32), use cross join instead
hash join
+ let sql = "select * from foo t1 join foo t2 on t1.c4 = t2.c4";
+ let msg = format!("Creating logical plan for '{}'", sql);
Review Comment:
So I think CrossJoin is almost never what the user would want: as once the
tables get beyond any trivial size the query will effectively never finish or
will run out of memory. An error is clearer.
From the issue description
https://github.com/apache/arrow-datafusion/issues/2145#issue-1191065060 I think
@pjmore's idea to cast unsupported types to a supported type is a good one --
the arrow `cast` kernels are quite efficient for things like `Date32` ->
`Int32` (no copies) as the representations are the same
@pjmore what do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]