Feng Zhang created SEDONA-630: --------------------------------- Summary: Improve ST_Union_Aggr performance Key: SEDONA-630 URL: https://issues.apache.org/jira/browse/SEDONA-630 Project: Apache Sedona Issue Type: Improvement Reporter: Feng Zhang
The function ST_Union_Aggr is slow. This is because this function is implemented in the most inefficient way: union geometry one after another. ```scala class ST_Union_Aggr extends Aggregator[Geometry, Geometry, Geometry] with TraitSTAggregateExec { def reduce(buffer: Geometry, input: Geometry): Geometry = { if (buffer.equalsExact(initialGeometry)) input else buffer.union(input) } def merge(buffer1: Geometry, buffer2: Geometry): Geometry = { if (buffer1.equals(initialGeometry)) buffer2 else if (buffer2.equals(initialGeometry)) buffer1 else buffer1.union(buffer2) } } ``` There are multiple improvements in JTS that improved the performance of union multiple geometry objects: - Cascaded union: https://lin-ear-th-inking.blogspot.com/2007/11/fast-polygon-merging-in-jts-using.html - Better overlay implementation (OverlayNG): https://lin-ear-th-inking.blogspot.com/2020/05/jts-overlay-next-generation.html We should switch to the new `OverlayNGRobust.*union*` interface to get the best thing JTS provides. This also requires us to keep all reduced geometries in memory, we may maintain a buffer with maximum size, and run OverlayNGRobust.union when the maximum size is exceeded. This will help balancing the speed and memory usage. -- This message was sent by Atlassian Jira (v8.20.10#820010)