Perhaps the heterogeneous index could simply be extended to work on geometries, including heterogeneous SRID? I'm imagining basically that for geometries, we could treat the SRID as we do a type tag. Then, within each SRID, the geometries are ordered using a space filling curve, like what Young-Seok did a few years back (if I remember right). This seems more straightforward than trying to extend the technique used in the heterogeneous index to RTrees.
On Sun, Sep 7, 2025 at 8:03 AM Mike Carey <[email protected]> wrote: > > Sounds like a valuable project! We should look carefully at the > detailed syntax for providing the CRS info when the time comes, but I > like the direction. Also, we should figure out if/how this may interact > with our new heterogeneous indexes (as the goal for them is to mostly > replace the older form of indexes that require type knowledge). > > On 9/4/25 12:11 AM, Suryaa Charan Shivakumar wrote: > > Hello all, > > > > I’d like to start a discussion on how we might support CRS aware indexing > > for* geometry types* in AsterixDB as part of our geospatial work. > > > > *Current Situation* > > > > > > - Today we only have the geometry type (AGEOMETRY), with no CRS > > constraint. > > - This makes it flexible, but correctness and performance issues arise > > when mixed-CRS data is indexed and eventually queried. > > > > One idea is to introduce CRS-constrained geometry types, so a field can > > explicitly declare a CRS, > > > > ``` > > > > CREATE TYPE LocationType AS { > > > > coordinates: geometry(EPSG:4326), -- CRS-constrained geometry > > > > mixed_coordinates: geometry, -- Unconstrained > > > > address: string > > > > }; > > > > CREATE DATASET Locations(LocationType) PRIMARY KEY id; > > > > CREATE INDEX geo_idx ON Locations(coordinates) TYPE RTREE; -- CRS > > auto-inferred > > > > ``` > > > > This separation allows users to decide: > > > > > > - *Constrained geometry*: safe for indexing and cross-column operations. > > - *Unconstrained geometry*: flexible for heterogeneous CRS use cases > > > > Benefits > > > > > > - *Clarity*: Schema explicitly encodes CRS. > > - *Optionality*: Mixed CRS still possible where needed. > > - *Efficiency*: Query planner can reason about CRS at compile time (no > > runtime lookups). > > - *Safety*: Cross-column CRS compatibility can be validated early. > > - *Distributed-friendly*: CRS info can travel with type metadata; > > workers validate independently. > > > > Flow diagram: > > https://urldefense.com/v3/__https://www.mermaidchart.com/play?utm_source=mermaid_live_editor&utm_medium=toggle*pako:eNp9Uttu2kAQ_ZWRpUjpAz_Qh1YBQ5OUWwCpqkwetusx3sa7a-2uRSjk3zsenMWoVfxmzxmf2xwTaXNMPidFZfeyFC7AJt0aoOfumIogYC3RCKfs17etOQ9ubmDohJEl-g4Jg8EXGGbbZI57yGnLY4ABjFZrkNb44IQymEM41LhNnvtLj7Q0flU-KLPrbVa4E_JwWYjELQHL-mHdS6v5PPHNr50TddnO23H2D65jBRgy74h4UyxIFpPAXoUSdmg1BneAW1L-KSoFGPFOmj0Yj5SQdVBZkbPeiEkZMz6mFj071yLIEnKUlXBknj61EZ7B4xZ8-on-BJPsTkqsAwiTgw_WYdTxfA2f2xN8I90r_I2SVVCwxhfWabi1dVDWiKove8KS7rORQxEQfC2CEhUok-NrBN0z6IG80VealehUODsonNXgqWd9sfnA8O_Zgvi0-oMOGt8ZVqawIAJVrmtVUbBKY7eHJu-VGAv_qMkpnwCX-X98lPTIkqbZqET5AqrgXkB5EgSWCiZtETtl7Ox82o1RHN5VNbOY9bzNujGw3sAmBh0slFbbHRoy38t6dml0kS2dlUid81W9s1xdy5x1LN5fF_y6_LipJYOeuqaE942m5HsmIvKJkats0oSG7knx2XrQjQ_dWfKfeztUUPL2F7cRRQM__;Iw!!CzAuKJ42GuquVTTmVmPViYEvSg!JHDwJO7PkUiM8_s87Jmvr9jMGAznGuVDDMXkeJIu8a9hL_NBu4K5qP8FAwtLXcK9B4meV06R2KZhkQ$ > > > > But it doesn't follow the loose typing preferred in the world of > > semi-structured data. It however ensures performance, valid results for > > queries and complete index. Other ideas include the below, > > > > > > - Users should be warned about *lossy transformations* if we enforce > > converting everything to a single projection (e.g., 4326). > > - Another option might be to support *multiple indexes* on the same > > dataset, each tied to a different CRS (with a fixed practical limit, say > > 10), extending R-Tree physically to handle multiple CRS domains. > > - Treat CRS almost like a *type domain similar *to heterogenous > > indexing, > > rather than a hard constraint at the type level. > > > > *Questions for the community:* > > > > > > - Should we enforce CRS constraints strictly at the type level, or > > consider index-level CRS flexibility (multiple CRSs per index)? > > - How should we handle schema evolution and legacy datasets without CRS > > metadata? > > - Are there better approaches to balance correctness, performance, and > > flexibility? > > > > Looking forward to everyone’s thoughts. > > > > Additional Context below - > > > > *What is a CRS?* > > > > A *Coordinate Reference System (CRS)* defines how numbers in a geometry > > (like (x,y) pairs) map to real-world locations. > > > > > > - For example, in EPSG:4326 (WGS84), coordinates are in > > *longitude/latitude > > degrees*. > > - In EPSG:3857 (Web Mercator), the same numbers represent *meters on a > > projected plane*. > > > > Without CRS information, two geometries may use different measurement > > systems, and a “distance” or “intersection” operation between them would be > > meaningless. > > > > *How AsterixDB Handles CRS Today (as per current patchset in review and > > APE)* > > > > > > - We have a single type geometry with *no schema-level CRS constraint*. > > - Each geometry object in WKB (Well-Known Binary) carries only a > > *reference > > ID* (e.g., 4326), not the full definition. > > - CRS definitions themselves (EPSG codes, PROJ strings, etc.) are loaded > > into memory (Apache Derby) with an API call. > > - When a user calls ST_Transform, we look up the definition using the > > stored ID and perform transformations with *Apache SIS*. > > - This design means CRS enforcement is *runtime-only*: validation or > > transformation happens at query execution, not at schema declaration. > > > > *Why Indexes Must Be in the Same CRS*Spatial indexes (R-Trees) compare > > bounding boxes of geometries. If geometries use different CRSs: > > > > - One geometry might be in *degrees*, another in *meters*, another in > > *feet*. > > - Mixing them breaks the index, since it assumes all values are in a > > single coordinate system. > > > > *Analogy: *It’s like cataloging items in a warehouse. If half the > > measurements are in inches and half in centimeters, the index will be > > inconsistent, a “short” object in inches could look bigger than a “long” > > object in centimeters. > > > > Thank you, > > Suryaa > >
