Kontinuation opened a new issue, #281:
URL: https://github.com/apache/sedona-db/issues/281

   This issue stems from the implementation of memory-consuming physical 
operators such as spatial join. These operators may keep a collection of 
geometry objects in memory and it is hard to know how much memory was consumed 
by the collection. For instance, spatial join operator keeps indexed geometry 
objects for a variety of computational geometry libraries for faster evaluation 
of spatial predicates:
   
   1. We may evaluate spatial predicates directly on a binary buffer containing 
WKB using 
[sedona-geo-generic-alg](https://github.com/apache/sedona-db/tree/2939063fbac28dc06a03e1d18743fa327fbba7ed/rust/sedona-geo-generic-alg):
 our geo port that works directly on geo-traits, we can have an accurate 
estimation in this case.
   2. We may hold TG geometry objects in memory when using TG to evaluate 
spatial predicates. Fortunately, TG provides a function 
[`tg_geom_memsize`](https://github.com/tidwall/tg/blob/b26f589e18027cbbdff70268f9eb6d1fad6dbee1/tg.h#L121)
 for retrieving the in-memory size of TG geometry objects.
   3. We may also use GEOS to evaluate spatial predicates. We hold GEOS 
PreparedGeometry objects in memory and reuse them in such cases. Unfortunately, 
it is hard to tell how much memory was taken up by those GEOS PreparedGeometry 
objects. GEOS does not have an interface for retrieving that, it does not 
support custom memory allocators either.
   
   Currently, we perform a [very rough 
estimation](https://github.com/apache/sedona-db/blob/2939063fbac28dc06a03e1d18743fa327fbba7ed/rust/sedona-spatial-join/src/refine/geos.rs#L286-L289)
 when reserving memory for GEOS PreparedGeometry objects. This estimation could 
be way off. We need to do some experiments to implement a better estimator for 
PreparedGeometry object size.
   
   A future problem is that there's no API for estimating the amount of memory 
required for storing a collection of geometry objects ahead of time. Libraries 
such as TG provide APIs for retrieving the size of geometry objects after 
creation, but do not provide APIs for estimating memory usage before actually 
creating those objects. A conservative estimation is needed when we want to 
decide whether we'll go out-of-core ahead of time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to