Feng Zhang created SEDONA-630:
---------------------------------

             Summary: Improve ST_Union_Aggr performance
                 Key: SEDONA-630
                 URL: https://issues.apache.org/jira/browse/SEDONA-630
             Project: Apache Sedona
          Issue Type: Improvement
            Reporter: Feng Zhang


The function ST_Union_Aggr is slow. This is because this function is 
implemented in the most inefficient way: union geometry one after another.

```scala
class ST_Union_Aggr extends Aggregator[Geometry, Geometry, Geometry] with 
TraitSTAggregateExec {

  def reduce(buffer: Geometry, input: Geometry): Geometry = {
    if (buffer.equalsExact(initialGeometry)) input
    else buffer.union(input)
  }

  def merge(buffer1: Geometry, buffer2: Geometry): Geometry = {
    if (buffer1.equals(initialGeometry)) buffer2
    else if (buffer2.equals(initialGeometry)) buffer1
    else buffer1.union(buffer2)
  }
}
```

There are multiple improvements in JTS that improved the performance of union 
multiple geometry objects:

- Cascaded union: 
https://lin-ear-th-inking.blogspot.com/2007/11/fast-polygon-merging-in-jts-using.html
- Better overlay implementation (OverlayNG): 
https://lin-ear-th-inking.blogspot.com/2020/05/jts-overlay-next-generation.html

We should switch to the new `OverlayNGRobust.*union*` interface to get the best 
thing JTS provides. This also requires us to keep all reduced geometries in 
memory, we may maintain a buffer with maximum size, and run 
OverlayNGRobust.union when the maximum size is exceeded. This will help 
balancing the speed and memory usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to