+1 to moving forward this simpler way - and using our own system to manage the data - and then seeing how that goes w/potential users before venturing into a more dynamic approach.

(How many is many in "many different CRSes"?  Just curious what the CRS profiles are for typical raster data sets.)

On 2/3/26 9:26 AM, Ian Maxon wrote:
This APE has been stuck due to an unfortunate situation with SIS.
Internally, SIS uses Derby, which was abandoned recently. Clearly
introducing a new feature on top of an unmaintained (even if
transitive) dependency is undesirable. There is the possibility to use
a different embedded database, because SIS uses JDBC to talk with
Derby, but it's not been a straightforward process to switch. Using
Derby was also turning into a bit of a headache to do properly in the
first place. For example, for resources of any non-negligble size, it
is expected that they should be able to be evicted from memory when a
query isn't using them. SIS doesn't explicitly allow or expect a user
to purge/unload the embedded db once it's loaded. There are ways to do
it; but it's all a bit weedy and requires using things in a possibly
unexpected or undefined way.
Suryaa and I discussed an idea that came up in a discussion between
Mike and I to relax some of the constraints on mixed-CRS operations
that led to the current design in the first place. The necessity of
having SIS's embedded database on
each NC came from the desire to let CRS information be determined
completely flexibly at runtime, like how completely open records work.
However it isn't a perfect analogy, because the transforms are lossy,
extremely expensive, and sometimes not desirable.

The proposal is to first:
- Support management of CRS info via a Metadata dataset
- Allow explicit transforms from one CRS to another using these
managed CRSes by gathering the necessary CRSes and compiling them into
the job as WKT

Then, to better support datasets with multiple CRSes which may not be
known, this method will be extended. Queries that use transforms will
issue a query on the Sample dataset to determine the CRSes in the
sample, and then include those CRSes in the query as a best-effort
estimate. Invocations of st_transform where the CRS isn't one of the
ones that exist in the sample will return missing and issue a warning.

As I understand it from the discussion between Suryaa and I, this
should suffice for vector geometry data (e.g. points, polygons)
because these tend to only have a handful of CRSes. We might have to
make this more robust for raster data as those can have many different
CRSes.

What does everyone else think? If there are no issues raised, I think
we should move forward in this way to allow this feature to progress.

On Wed, Aug 27, 2025 at 12:08 PM Suryaa Charan Shivakumar
<[email protected]> wrote:
Hello Wail,

Hope you are doing well. Thank you for bringing this up. We should plan
exploring this in AsterixDB as part of the ongoing geospatial roadmap. So
let's start a separate discussion around this, I have just added some of my
thoughts based on my understanding/knowledge below for starters,

Current Capabilities in AsterixDB

    1. AsterixDB already provides *native spatial data types and functions*.
    2. It supports *secondary indexing with LSM-based R-trees*, which enable
    topological predicates such as ST_Intersects, ST_Within, and others.
    3. These indexes are part of AsterixDB’s *built-in index suite* and are
    leveraged for spatial index scans where applicable.

What it would take to support H3 (roughly) :

    1. Introduce *library and packaging changes* to bring in H3.
    2. Expose *H3 functions* as first-class citizens, e.g., h3_latlng_to_cell
    , h3_grid_disk, h3_grid_ring, cellToBoundary(cell-to-polygon), and
    resolution helpers.
    3. Provide *H3-derived secondary indexing support*—either by extending
    scalar secondary indexes or introducing a dedicated H3 index type.
    4. Enhance *query planning and predicate pushdown* to effectively
    leverage H3 cell computations.

What would (theoretically) improve with H3 :

    1. *Performance improvements*: For neighborhood/nearest-type queries,
    H3’s cell-neighborhood filtering can reduce the candidate set before exact
    evaluation.
    2. *Workload flexibility*: Useful for spatial bucketing, partitioning,
    and efficient aggregations.
    3. *Pre-filtering capability*: At scale, H3 can act as a *coarse*
    pre-filter, pruning large data volumes before applying exact geometric
    predicates.

Thank you,
Suryaa

On Sat, Aug 2, 2025 at 12:11 AM Wail Alkowaileet<[email protected]> wrote:

+1 for that!

Since this touches the spatial aspect, what do you think of supporting h3
<https://h3geo.org/> indexing in AsterixDB? I stumbled into this
<
https://www.architecture-performance.fr/ap_blog/spatial-queries-in-duckdb-with-r-tree-and-h3-indexing/
article and I find it interesting.

On Fri, Aug 1, 2025 at 9:28 PM Mike Carey<[email protected]> wrote:

Looks very good at this point IMO!

On 8/1/25 9:57 AM, Suryaa Charan Shivakumar wrote:
Hello AsterixDB Dev Community,

I hope you’re all doing well. I wanted to share a quick update on our
ongoing efforts to integrate Coordinate Reference System (CRS) support
into
AsterixDB, tracked under JIRA epic *ASTERIXDB-3542*
<https://issues.apache.org/jira/browse/ASTERIXDB-3542>

We’ve made some progress in refining how CRS metadata is represented
and
managed within the system. More details are available in the APE: APE
17:
Spatial CRS Support in AsterixDB
<
https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+17%3A++Coordinate+Reference+System+%28CRS%29+support+for+Spatial+Data+in+AsterixDB

The overarching objective remains the same: *To make AsterixDB
spatially
aware in a real-world sense enabling geospatial queries and analytics
that
respect map projections, distance accuracy, and cross-CRS
interoperability.*
As always, we welcome your feedback and thoughts to ensure we’re
building
this in a robust and community aligned way.

Best regards,
Suryaa Charan

On Thu, Jan 9, 2025 at 5:06 PM Mike Carey<[email protected]> wrote:

+1 for this work - nice APE!  I added a comment with one question and
one suggestion on the APE wiki.

On 1/9/25 4:25 PM, Suryaa Charan Shivakumar wrote:
Hello AsterixDB Dev Community,

I hope this message finds you well. I am pleased to share a proposal
for
adding *Coordinate Reference System (CRS) support* for spatial data
in
AsterixDB, tracked under JIRA epic *ASTERIXDB-3542* and targeted for
the
9.10.0 release.

Key Motivations:

      1. Improved Spatial Data Handling: Enable accurate, CRS-aware
spatial
      operations, including transformations and validations.
Currently,
spatial
      data in AsterixDB resides in a Euclidean space, which means
spatial
      analysis using the provided functions may not yield precise
results.
      Integrating CRS is critical to ensure accurate, real-world
spatial
      operations by accounting for the Earth's curvature and
coordinate
      transformations.
      2. Enhanced Interoperability: Ensure compatibility with external
systems
      by supporting standard (e.g., EPSG) and custom CRS definitions.

APE -

https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+17%3A+Spatial+CRS
We invite your feedback on this proposal to ensure we address
potential
impacts and refine the implementation plan.

Thank you for your collaboration.

Best regards,
Suryaa Charan



--

*Regards,*
Wail Alkowaileet

Reply via email to