Kontinuation opened a new issue, #281: URL: https://github.com/apache/sedona-db/issues/281
This issue stems from the implementation of memory-consuming physical operators such as spatial join. These operators may keep a collection of geometry objects in memory and it is hard to know how much memory was consumed by the collection. For instance, spatial join operator keeps indexed geometry objects for a variety of computational geometry libraries for faster evaluation of spatial predicates: 1. We may evaluate spatial predicates directly on a binary buffer containing WKB using [sedona-geo-generic-alg](https://github.com/apache/sedona-db/tree/2939063fbac28dc06a03e1d18743fa327fbba7ed/rust/sedona-geo-generic-alg): our geo port that works directly on geo-traits, we can have an accurate estimation in this case. 2. We may hold TG geometry objects in memory when using TG to evaluate spatial predicates. Fortunately, TG provides a function [`tg_geom_memsize`](https://github.com/tidwall/tg/blob/b26f589e18027cbbdff70268f9eb6d1fad6dbee1/tg.h#L121) for retrieving the in-memory size of TG geometry objects. 3. We may also use GEOS to evaluate spatial predicates. We hold GEOS PreparedGeometry objects in memory and reuse them in such cases. Unfortunately, it is hard to tell how much memory was taken up by those GEOS PreparedGeometry objects. GEOS does not have an interface for retrieving that, it does not support custom memory allocators either. Currently, we perform a [very rough estimation](https://github.com/apache/sedona-db/blob/2939063fbac28dc06a03e1d18743fa327fbba7ed/rust/sedona-spatial-join/src/refine/geos.rs#L286-L289) when reserving memory for GEOS PreparedGeometry objects. This estimation could be way off. We need to do some experiments to implement a better estimator for PreparedGeometry object size. A future problem is that there's no API for estimating the amount of memory required for storing a collection of geometry objects ahead of time. Libraries such as TG provide APIs for retrieving the size of geometry objects after creation, but do not provide APIs for estimating memory usage before actually creating those objects. A conservative estimation is needed when we want to decide whether we'll go out-of-core ahead of time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
