Hello team, Hope you are doing well. Quick update(s) on this thread after some follow-up discussion, design rounds with Ian and team. The design described in Ian's mail is the direction we landed on, with a few clarifications we worked through together:
- Phase 1: *submitted*. A patchset implementing the metadata-managed CRS catalog and explicit ST_Transform over WKT compiled into the job has been posted for review (ASTERIXDB-3542). The transform path uses STTransformResolveCRSRule, an Algebricks rule that resolves CRS WKT from Metadata.CoordinateReferenceSystem at compile time and injects it onto the function expression as opaque parameters, so no metadata lookups happen in the per-tuple hot path and a missing CRS fails at compile time rather than mid-scan. ST_Distance_Spheroid (WGS 84 geodesic) is included. CRS rows are dataverse-scoped, not global, so dropping a dataverse cascades cleanly. No SIS embedded-DB resources are loaded on the NCs, only org.apache.sis.referencing.CRS.fromWKT and MathTransform are used at runtime, against WKT we ship in the job. - Phase 2: sample-driven best-effort transforms. The extension Ian described (issue a query against the sample dataset to discover the CRSes actually present, compile those into the job, return missing + warning on un-sampled SRIDs) is the Phase 2 plan. It is gated on first persisting the SRID with the geometry value, which itself requires an EWKB/EWKT pair on the I/O side, also laid out in the APE. - Indexing constraint, documented as a user invariant for now. Because Phase 1 has no on-disk SRID, the system cannot enforce "one CRS per indexed field"; it is the user's responsibility to keep an indexed dataset in a single CRS until Phase 2 gives us the SRID-on-value foundation and Phase 3 turns that into a typed column constraint. The APE calls this out explicitly so the operational contract is visible. The revised APE (https://cwiki.apache.org/confluence/display/ASTERIXDB/APE +17%3A+Spatial+CRS) is up on Confluence for review, it supersedes the original APE-17 page on the same wiki entry and reflects everything above plus the open design calls on where the SRID lives once Phase 2 starts (record-level via EWKB vs. column/attribute-level in the type system). Feedback on that question in particular would be very welcome before Phase 2 work begins.Thanks Ian for the unblocking discussion and help getting it rolling. Best, Suryaa On Tue, Feb 3, 2026 at 9:27 AM Ian Maxon <[email protected]> wrote: > This APE has been stuck due to an unfortunate situation with SIS. > Internally, SIS uses Derby, which was abandoned recently. Clearly > introducing a new feature on top of an unmaintained (even if > transitive) dependency is undesirable. There is the possibility to use > a different embedded database, because SIS uses JDBC to talk with > Derby, but it's not been a straightforward process to switch. Using > Derby was also turning into a bit of a headache to do properly in the > first place. For example, for resources of any non-negligble size, it > is expected that they should be able to be evicted from memory when a > query isn't using them. SIS doesn't explicitly allow or expect a user > to purge/unload the embedded db once it's loaded. There are ways to do > it; but it's all a bit weedy and requires using things in a possibly > unexpected or undefined way. > Suryaa and I discussed an idea that came up in a discussion between > Mike and I to relax some of the constraints on mixed-CRS operations > that led to the current design in the first place. The necessity of > having SIS's embedded database on > each NC came from the desire to let CRS information be determined > completely flexibly at runtime, like how completely open records work. > However it isn't a perfect analogy, because the transforms are lossy, > extremely expensive, and sometimes not desirable. > > The proposal is to first: > - Support management of CRS info via a Metadata dataset > - Allow explicit transforms from one CRS to another using these > managed CRSes by gathering the necessary CRSes and compiling them into > the job as WKT > > Then, to better support datasets with multiple CRSes which may not be > known, this method will be extended. Queries that use transforms will > issue a query on the Sample dataset to determine the CRSes in the > sample, and then include those CRSes in the query as a best-effort > estimate. Invocations of st_transform where the CRS isn't one of the > ones that exist in the sample will return missing and issue a warning. > > As I understand it from the discussion between Suryaa and I, this > should suffice for vector geometry data (e.g. points, polygons) > because these tend to only have a handful of CRSes. We might have to > make this more robust for raster data as those can have many different > CRSes. > > What does everyone else think? If there are no issues raised, I think > we should move forward in this way to allow this feature to progress. > > On Wed, Aug 27, 2025 at 12:08 PM Suryaa Charan Shivakumar > <[email protected]> wrote: > > > > Hello Wail, > > > > Hope you are doing well. Thank you for bringing this up. We should plan > > exploring this in AsterixDB as part of the ongoing geospatial roadmap. So > > let's start a separate discussion around this, I have just added some of > my > > thoughts based on my understanding/knowledge below for starters, > > > > Current Capabilities in AsterixDB > > > > 1. AsterixDB already provides *native spatial data types and > functions*. > > 2. It supports *secondary indexing with LSM-based R-trees*, which > enable > > topological predicates such as ST_Intersects, ST_Within, and others. > > 3. These indexes are part of AsterixDB’s *built-in index suite* and > are > > leveraged for spatial index scans where applicable. > > > > What it would take to support H3 (roughly) : > > > > 1. Introduce *library and packaging changes* to bring in H3. > > 2. Expose *H3 functions* as first-class citizens, e.g., > h3_latlng_to_cell > > , h3_grid_disk, h3_grid_ring, cellToBoundary(cell-to-polygon), and > > resolution helpers. > > 3. Provide *H3-derived secondary indexing support*—either by extending > > scalar secondary indexes or introducing a dedicated H3 index type. > > 4. Enhance *query planning and predicate pushdown* to effectively > > leverage H3 cell computations. > > > > What would (theoretically) improve with H3 : > > > > 1. *Performance improvements*: For neighborhood/nearest-type queries, > > H3’s cell-neighborhood filtering can reduce the candidate set before > exact > > evaluation. > > 2. *Workload flexibility*: Useful for spatial bucketing, partitioning, > > and efficient aggregations. > > 3. *Pre-filtering capability*: At scale, H3 can act as a *coarse* > > pre-filter, pruning large data volumes before applying exact geometric > > predicates. > > > > Thank you, > > Suryaa > > > > On Sat, Aug 2, 2025 at 12:11 AM Wail Alkowaileet <[email protected]> > wrote: > > > > > +1 for that! > > > > > > Since this touches the spatial aspect, what do you think of supporting > h3 > > > <https://h3geo.org/> indexing in AsterixDB? I stumbled into this > > > < > > > > https://www.architecture-performance.fr/ap_blog/spatial-queries-in-duckdb-with-r-tree-and-h3-indexing/ > > > > > > > article and I find it interesting. > > > > > > On Fri, Aug 1, 2025 at 9:28 PM Mike Carey <[email protected]> wrote: > > > > > > > Looks very good at this point IMO! > > > > > > > > On 8/1/25 9:57 AM, Suryaa Charan Shivakumar wrote: > > > > > Hello AsterixDB Dev Community, > > > > > > > > > > I hope you’re all doing well. I wanted to share a quick update on > our > > > > > ongoing efforts to integrate Coordinate Reference System (CRS) > support > > > > into > > > > > AsterixDB, tracked under JIRA epic *ASTERIXDB-3542* > > > > > <https://issues.apache.org/jira/browse/ASTERIXDB-3542> > > > > > > > > > > We’ve made some progress in refining how CRS metadata is > represented > > > and > > > > > managed within the system. More details are available in the APE: > APE > > > 17: > > > > > Spatial CRS Support in AsterixDB > > > > > < > > > > > > > > https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+17%3A++Coordinate+Reference+System+%28CRS%29+support+for+Spatial+Data+in+AsterixDB > > > > > > > > > > > > > > > The overarching objective remains the same: *To make AsterixDB > > > spatially > > > > > aware in a real-world sense enabling geospatial queries and > analytics > > > > that > > > > > respect map projections, distance accuracy, and cross-CRS > > > > interoperability.* > > > > > > > > > > As always, we welcome your feedback and thoughts to ensure we’re > > > building > > > > > this in a robust and community aligned way. > > > > > > > > > > Best regards, > > > > > Suryaa Charan > > > > > > > > > > On Thu, Jan 9, 2025 at 5:06 PM Mike Carey<[email protected]> > wrote: > > > > > > > > > >> +1 for this work - nice APE! I added a comment with one question > and > > > > >> one suggestion on the APE wiki. > > > > >> > > > > >> On 1/9/25 4:25 PM, Suryaa Charan Shivakumar wrote: > > > > >>> Hello AsterixDB Dev Community, > > > > >>> > > > > >>> I hope this message finds you well. I am pleased to share a > proposal > > > > for > > > > >>> adding *Coordinate Reference System (CRS) support* for spatial > data > > > in > > > > >>> AsterixDB, tracked under JIRA epic *ASTERIXDB-3542* and targeted > for > > > > the > > > > >>> 9.10.0 release. > > > > >>> > > > > >>> Key Motivations: > > > > >>> > > > > >>> 1. Improved Spatial Data Handling: Enable accurate, > CRS-aware > > > > spatial > > > > >>> operations, including transformations and validations. > > > Currently, > > > > >> spatial > > > > >>> data in AsterixDB resides in a Euclidean space, which means > > > > spatial > > > > >>> analysis using the provided functions may not yield precise > > > > results. > > > > >>> Integrating CRS is critical to ensure accurate, real-world > > > spatial > > > > >>> operations by accounting for the Earth's curvature and > > > coordinate > > > > >>> transformations. > > > > >>> 2. Enhanced Interoperability: Ensure compatibility with > external > > > > >> systems > > > > >>> by supporting standard (e.g., EPSG) and custom CRS > definitions. > > > > >>> > > > > >>> APE - > > > > >>> > > > > >> > > > > > > > > https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+17%3A+Spatial+CRS > > > > >>> We invite your feedback on this proposal to ensure we address > > > potential > > > > >>> impacts and refine the implementation plan. > > > > >>> > > > > >>> Thank you for your collaboration. > > > > >>> > > > > >>> Best regards, > > > > >>> Suryaa Charan > > > > >>> > > > > > > > > > > > > -- > > > > > > *Regards,* > > > Wail Alkowaileet > > > >
