Kristin Cowalcijk created SEDONA-261:
----------------------------------------
Summary: Cannot run distance join using broadcast index join when
the distance expression references to attributes from right-side relation
Key: SEDONA-261
URL: https://issues.apache.org/jira/browse/SEDONA-261
Project: Apache Sedona
Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Kristin Cowalcijk
The following distance join query won't run using broadcast index join:
{code:sql}
SELECT * FROM df1 JOIN df2 ON ST_Distance(df1.geom, df2.geom) < df2.dist
{code}
The exception raised by Sedona is as follows:
{code}
Couldn't find dist#8638 in [id#8583,geom#8589]
java.lang.IllegalStateException: Couldn't find dist#8638 in [id#8583,geom#8589]
at
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
at
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:528)
at
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:73)
at
org.apache.spark.sql.sedona_sql.strategy.join.SpatialIndexExec.doExecuteBroadcast(SpatialIndexExec.scala:54)
{code}
If the distance expression references attribute from the left-side relation,
the distance join will run without problem when using broadcast index join.
{code:sql}
SELECT * FROM df1 JOIN df2 ON ST_Distance(df1.geom, df2.geom) < df1.dist
{code}
The space-partitioned distance join does not have this problem.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)