Kristin Cowalcijk created SEDONA-262:
----------------------------------------
Summary: Don't optimize equi-join by default, add an option to
configure when to optimize spatial joins
Key: SEDONA-262
URL: https://issues.apache.org/jira/browse/SEDONA-262
Project: Apache Sedona
Issue Type: New Feature
Reporter: Kristin Cowalcijk
Apache Sedona optimizes all join having spatial predicates as join conditions,
including equi-joins with spatial predicates. For example, the following query
will be optimized as a RangeJoin:
{code:scala}
df1.join(df2, df1("id1") === df2("id2") && ST_Contains(df1("geom"),
df2("geom")))
{code}
Where it may be more efficient to run sort-merge join or hash join using the
equi-condition {{df1.id1 = df2.id2}} on this query. This problem also arises
when users want to perform a spatial join using the S2 cell IDs of both
geometries and then use a spatial predicate to filter false positives.
We propose to add a configuration to {{SedonaConf}} named
{{sedona.join.optimizationmode}}, it can be configured as one of the following
values:
* *all*: optimize all joins having spatial predicate in join conditions. This
is the current behavior of Apache Sedona.
* *none*: disable spatial join optimization.
* *nonequi*: only enable spatial join optimization on non-equi joins. _This
will be the default mode_.
When {{sedona.join.optimizationmode}} is configured as *nonequi*, it won't
optimize the aforementioned equi-join.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)