[GitHub] [arrow-datafusion] AssHero commented on a diff in pull request #2702: Make sure that the data types are supported in hashjoin before genera…

GitBox Wed, 08 Jun 2022 18:57:16 -0700


AssHero commented on code in PR #2702:
URL: https://github.com/apache/arrow-datafusion/pull/2702#discussion_r893003362



##########
datafusion/core/tests/sql/joins.rs:
##########
@@ -1204,3 +1204,141 @@ async fn join_partitioned() -> Result<()> {
 
     Ok(())
 }
+
+#[tokio::test]
+async fn join_with_hash_unsupported_data_type() -> Result<()> {
+    let ctx = SessionContext::new();
+
+    let schema = Schema::new(vec![
+        Field::new("c1", DataType::Int32, true),
+        Field::new("c2", DataType::Utf8, true),
+        Field::new("c3", DataType::Int64, true),
+        Field::new("c4", DataType::Date32, true),
+    ]);
+    let data = RecordBatch::try_new(
+        Arc::new(schema),
+        vec![
+            Arc::new(Int32Array::from_slice(&[1, 2, 3])),
+            Arc::new(StringArray::from_slice(&["aaa", "bbb", "ccc"])),
+            Arc::new(Int64Array::from_slice(&[100, 200, 300])),
+            Arc::new(Date32Array::from(vec![Some(1), Some(2), Some(3)])),
+        ],
+    )?;
+    let table = MemTable::try_new(data.schema(), vec![vec![data]])?;
+    ctx.register_table("foo", Arc::new(table))?;
+
+    // join on hash unsupported data type (Date32), use cross join instead 
hash join
+    let sql = "select * from foo t1 join foo t2 on t1.c4 = t2.c4";
+    let msg = format!("Creating logical plan for '{}'", sql);

Review Comment:
   > So I think CrossJoin is almost never what the user would want: as once the 
tables get beyond any trivial size the query will effectively never finish or 
will run out of memory. An error is clearer.
   > 
   > From the issue description [#2145 
(comment)](https://github.com/apache/arrow-datafusion/issues/2145#issue-1191065060)
 I think @pjmore's idea to cast unsupported types to a supported type is a good 
one -- the arrow `cast` kernels are quite efficient for things like `Date32` -> 
`Int32` (no copies) as the representations are the same
   > 
   > @pjmore what do you think?
   
   this pr is only to make the hash unsupported join running in cross join 
instead of error/panic(I think it is not friendly in database system). 
   supporting more data types in hash join is the better way to solve this 
issue, and i'm already working on it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] AssHero commented on a diff in pull request #2702: Make sure that the data types are supported in hashjoin before genera…

Reply via email to