This APE has been stuck due to an unfortunate situation with SIS. Internally, SIS uses Derby, which was abandoned recently. Clearly introducing a new feature on top of an unmaintained (even if transitive) dependency is undesirable. There is the possibility to use a different embedded database, because SIS uses JDBC to talk with Derby, but it's not been a straightforward process to switch. Using Derby was also turning into a bit of a headache to do properly in the first place. For example, for resources of any non-negligble size, it is expected that they should be able to be evicted from memory when a query isn't using them. SIS doesn't explicitly allow or expect a user to purge/unload the embedded db once it's loaded. There are ways to do it; but it's all a bit weedy and requires using things in a possibly unexpected or undefined way. Suryaa and I discussed an idea that came up in a discussion between Mike and I to relax some of the constraints on mixed-CRS operations that led to the current design in the first place. The necessity of having SIS's embedded database on each NC came from the desire to let CRS information be determined completely flexibly at runtime, like how completely open records work. However it isn't a perfect analogy, because the transforms are lossy, extremely expensive, and sometimes not desirable.
The proposal is to first: - Support management of CRS info via a Metadata dataset - Allow explicit transforms from one CRS to another using these managed CRSes by gathering the necessary CRSes and compiling them into the job as WKT Then, to better support datasets with multiple CRSes which may not be known, this method will be extended. Queries that use transforms will issue a query on the Sample dataset to determine the CRSes in the sample, and then include those CRSes in the query as a best-effort estimate. Invocations of st_transform where the CRS isn't one of the ones that exist in the sample will return missing and issue a warning. As I understand it from the discussion between Suryaa and I, this should suffice for vector geometry data (e.g. points, polygons) because these tend to only have a handful of CRSes. We might have to make this more robust for raster data as those can have many different CRSes. What does everyone else think? If there are no issues raised, I think we should move forward in this way to allow this feature to progress. On Wed, Aug 27, 2025 at 12:08 PM Suryaa Charan Shivakumar <[email protected]> wrote: > > Hello Wail, > > Hope you are doing well. Thank you for bringing this up. We should plan > exploring this in AsterixDB as part of the ongoing geospatial roadmap. So > let's start a separate discussion around this, I have just added some of my > thoughts based on my understanding/knowledge below for starters, > > Current Capabilities in AsterixDB > > 1. AsterixDB already provides *native spatial data types and functions*. > 2. It supports *secondary indexing with LSM-based R-trees*, which enable > topological predicates such as ST_Intersects, ST_Within, and others. > 3. These indexes are part of AsterixDB’s *built-in index suite* and are > leveraged for spatial index scans where applicable. > > What it would take to support H3 (roughly) : > > 1. Introduce *library and packaging changes* to bring in H3. > 2. Expose *H3 functions* as first-class citizens, e.g., h3_latlng_to_cell > , h3_grid_disk, h3_grid_ring, cellToBoundary(cell-to-polygon), and > resolution helpers. > 3. Provide *H3-derived secondary indexing support*—either by extending > scalar secondary indexes or introducing a dedicated H3 index type. > 4. Enhance *query planning and predicate pushdown* to effectively > leverage H3 cell computations. > > What would (theoretically) improve with H3 : > > 1. *Performance improvements*: For neighborhood/nearest-type queries, > H3’s cell-neighborhood filtering can reduce the candidate set before exact > evaluation. > 2. *Workload flexibility*: Useful for spatial bucketing, partitioning, > and efficient aggregations. > 3. *Pre-filtering capability*: At scale, H3 can act as a *coarse* > pre-filter, pruning large data volumes before applying exact geometric > predicates. > > Thank you, > Suryaa > > On Sat, Aug 2, 2025 at 12:11 AM Wail Alkowaileet <[email protected]> wrote: > > > +1 for that! > > > > Since this touches the spatial aspect, what do you think of supporting h3 > > <https://h3geo.org/> indexing in AsterixDB? I stumbled into this > > < > > https://www.architecture-performance.fr/ap_blog/spatial-queries-in-duckdb-with-r-tree-and-h3-indexing/ > > > > > article and I find it interesting. > > > > On Fri, Aug 1, 2025 at 9:28 PM Mike Carey <[email protected]> wrote: > > > > > Looks very good at this point IMO! > > > > > > On 8/1/25 9:57 AM, Suryaa Charan Shivakumar wrote: > > > > Hello AsterixDB Dev Community, > > > > > > > > I hope you’re all doing well. I wanted to share a quick update on our > > > > ongoing efforts to integrate Coordinate Reference System (CRS) support > > > into > > > > AsterixDB, tracked under JIRA epic *ASTERIXDB-3542* > > > > <https://issues.apache.org/jira/browse/ASTERIXDB-3542> > > > > > > > > We’ve made some progress in refining how CRS metadata is represented > > and > > > > managed within the system. More details are available in the APE: APE > > 17: > > > > Spatial CRS Support in AsterixDB > > > > < > > > > > https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+17%3A++Coordinate+Reference+System+%28CRS%29+support+for+Spatial+Data+in+AsterixDB > > > > > > > > > > > > The overarching objective remains the same: *To make AsterixDB > > spatially > > > > aware in a real-world sense enabling geospatial queries and analytics > > > that > > > > respect map projections, distance accuracy, and cross-CRS > > > interoperability.* > > > > > > > > As always, we welcome your feedback and thoughts to ensure we’re > > building > > > > this in a robust and community aligned way. > > > > > > > > Best regards, > > > > Suryaa Charan > > > > > > > > On Thu, Jan 9, 2025 at 5:06 PM Mike Carey<[email protected]> wrote: > > > > > > > >> +1 for this work - nice APE! I added a comment with one question and > > > >> one suggestion on the APE wiki. > > > >> > > > >> On 1/9/25 4:25 PM, Suryaa Charan Shivakumar wrote: > > > >>> Hello AsterixDB Dev Community, > > > >>> > > > >>> I hope this message finds you well. I am pleased to share a proposal > > > for > > > >>> adding *Coordinate Reference System (CRS) support* for spatial data > > in > > > >>> AsterixDB, tracked under JIRA epic *ASTERIXDB-3542* and targeted for > > > the > > > >>> 9.10.0 release. > > > >>> > > > >>> Key Motivations: > > > >>> > > > >>> 1. Improved Spatial Data Handling: Enable accurate, CRS-aware > > > spatial > > > >>> operations, including transformations and validations. > > Currently, > > > >> spatial > > > >>> data in AsterixDB resides in a Euclidean space, which means > > > spatial > > > >>> analysis using the provided functions may not yield precise > > > results. > > > >>> Integrating CRS is critical to ensure accurate, real-world > > spatial > > > >>> operations by accounting for the Earth's curvature and > > coordinate > > > >>> transformations. > > > >>> 2. Enhanced Interoperability: Ensure compatibility with external > > > >> systems > > > >>> by supporting standard (e.g., EPSG) and custom CRS definitions. > > > >>> > > > >>> APE - > > > >>> > > > >> > > > > > https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+17%3A+Spatial+CRS > > > >>> We invite your feedback on this proposal to ensure we address > > potential > > > >>> impacts and refine the implementation plan. > > > >>> > > > >>> Thank you for your collaboration. > > > >>> > > > >>> Best regards, > > > >>> Suryaa Charan > > > >>> > > > > > > > > -- > > > > *Regards,* > > Wail Alkowaileet > >
