Kristin Cowalcijk created SEDONA-262:
----------------------------------------

             Summary: Don't optimize equi-join by default, add an option to 
configure when to optimize spatial joins
                 Key: SEDONA-262
                 URL: https://issues.apache.org/jira/browse/SEDONA-262
             Project: Apache Sedona
          Issue Type: New Feature
            Reporter: Kristin Cowalcijk


Apache Sedona optimizes all join having spatial predicates as join conditions, 
including equi-joins with spatial predicates. For example, the following query 
will be optimized as a RangeJoin:

{code:scala}
df1.join(df2, df1("id1") === df2("id2") && ST_Contains(df1("geom"), 
df2("geom")))
{code}

Where it may be more efficient to run sort-merge join or hash join using the 
equi-condition {{df1.id1 = df2.id2}} on this query. This problem also arises 
when users want to perform a spatial join using the S2 cell IDs of both 
geometries and then use a spatial predicate to filter false positives.

We propose to add a configuration to {{SedonaConf}} named 
{{sedona.join.optimizationmode}}, it can be configured as one of the following 
values:

* *all*: optimize all joins having spatial predicate in join conditions. This 
is the current behavior of Apache Sedona.
* *none*: disable spatial join optimization.
* *nonequi*: only enable spatial join optimization on non-equi joins. _This 
will be the default mode_.

When {{sedona.join.optimizationmode}} is configured as *nonequi*, it won't 
optimize the aforementioned equi-join.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to