jiayuasu commented on issue #854: URL: https://github.com/apache/sedona/issues/854#issuecomment-1586053598
@Kontinuation I like this idea. Let's break this proposal to 3 standalone PRs. I believe they can be implemented separately without relying on each other. Step 1: Move the sampling logic to `analyze()` 1. Update `analyze()` function of `SpatialRDD` to include the poisson sampler 2. Build a spatial partitioning grid using the samples we collected in analyze(). Step 2: Add heuristics to determine the join side in `TraitJoinQueryExec.scala` (https://github.com/apache/sedona/blob/master/sql/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/TraitJoinQueryExec.scala#L59) Step 3: `DynamicIndexLookupJudgement` automatically determines the stream side on a per-grid basis. @dfischercodethoughts What you are proposing is Step 2. You can take a stab if you want. Since the `v1.4.1` will be released soon, I expect this entire proposal will be completed in `v1.5.0` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
