Kontinuation commented on issue #854:
URL: https://github.com/apache/sedona/issues/854#issuecomment-1585667724

   I propose a different (more complicated) approach:
   
   1. Make analyze() method of `SpatialRDD` take sample of the RDD. We can 
integrate the logic of poisson sampler into the `StatCalculator` and calculate 
the boundary, count, and samples in one pass.
   2. When running the spatial join physical plan, we `analyze()` both sides. 
Now we know the boundary and count of both sides, then we can simply apply some 
heuristics to determine which is the partitioning side (for example, take the 
one with more records as the partitioning side).
   3. Build a spatial partitioning grid using the samples we collected in 
`analyze()`. Since we also have samples of the other side, we can estimate how 
many geometries of both sides will fall into each grid.
   4. When running the `DynamicIndexLookupJudgement`, we can determine which 
side to build and which side to stream on a per-grid basis.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to