Re: [Discuss] Geospatial Support

Szehon Ho Mon, 30 Sep 2024 11:20:36 -0700

Hi all,

There have been several rounds of discussion on the PR:
https://github.com/apache/iceberg/pull/10981 and I think most of the main
points have been addressed.


If anyone is interested, please take a look.  If there are no other major
points, we plan to start a VOTE thread soon.

I know Jia and team are also volunteering to work on the prototype
immediately afterwards.

Thank you,
Szehon

On Tue, Aug 20, 2024 at 1:57 PM Szehon Ho <szehon.apa...@gmail.com> wrote:

> Hi all
>
> Please take a look at the proposed spec change to support Geo type for V3
> in : https://github.com/apache/iceberg/pull/10981, and comment or
> otherwise let me know your thoughts.
>
> Just as an FYI it incorporated the feedback from our last meeting (with
> Snowflake and Wherobots engineers).
>
> Thanks,
> Szehon
>
> On Wed, Jun 26, 2024 at 7:29 PM Szehon Ho <szehon.apa...@gmail.com> wrote:
>
>> Hi
>>
>> It was great to meet in person with Snowflake engineers and we had a good
>> discussion on the paths forward.
>>
>> Meeting notes for Snowflake- Iceberg sync.
>>
>>    - Iceberg proposed Geometry type defaults to (edges=planar ,
>>    crs=CRS84).
>>    - Snowflake has two types Geography (spherical) and Geometry (planar,
>>    with customizable CRS).  The data layout/encoding is the same for both
>>    types.  Let's see how we can support each in Iceberg type, especially wrt
>>    Iceberg partition/file pruning
>>    - Geography type support
>>    - Main concern is the need for a suitable partition transform for
>>       partition-level filter, the candidate is Micahel Entin's proposal
>>       
>> <https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit>
>>       .
>>       - Secondary concern is file and RG-level filtering.  Gang's Parquet
>>       proposal <https://github.com/apache/parquet-format/pull/240/files> 
>> allow
>>       storage of S2 / H3 ID's in Parquet stats, and so we can also leverage 
>> that
>>       in Iceberg pruning code (Google and Uber libraries are compatible)
>>    - Geometry type support
>>       -  Main concern is partition transform needs to understand CRS,
>>       but this can be solved by having XZ2 transform created with 
>> customizable
>>       min/max lat/long range (its all it needs)
>>    - Should (CRS, edges) be stored properties on Geography type in Phase
>>    1?
>>       - Should be fine to store, with only allowing defaults in Phase 1.
>>       - Concern 1: If edges is stored, there will be ask to store other
>>       properties like (orientation, epoch).  Solution is to punt these 
>> follow-on
>>       properties for later.
>>       - Concern 2: if crs is stored, what format?  PROJJSON vs SRID.
>>       Solution is to leave it as a string
>>       - Concern 3: if crs is stored as a string, Iceberg cannot read
>>       it.  This should be ok, as we only need this for XZ2 transform, where 
>> the
>>       user already passes in the info from CRS (up to user to make sure these
>>       align).
>>
>> Thanks
>> Szehon
>>
>> On Tue, Jun 18, 2024 at 12:23 PM Szehon Ho <szehon.apa...@gmail.com>
>> wrote:
>>
>>> Jia and I will sync with the Snowflake folks to see if we can have a
>>> solution, or roadmap to solution, in the proposal.
>>>
>>> Thanks JB for the interest!  By the way, I want to schedule a meeting to
>>> go over the proposal, it seems there's good feedback from folks from geo
>>> side (and even Parquet community), but not too many eyes/feedback from
>>> other folks/PMC on Iceberg community.  This might be due to lack of
>>> familiarity/ time to read through it all.  In fact, a lot of the advanced
>>> discussions like this one are for Phase 2 items, and Phase 1 items are
>>> relatively straightforward, so wanted to explain that.  As I know its
>>> summer vacation for some folks, we can do this in a week or early July,
>>> hope that sounds good with everyone.
>>>
>>> Thanks,
>>> Szehon
>>>
>>> On Tue, Jun 18, 2024 at 1:54 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>> wrote:
>>>
>>>> Hi Jia
>>>>
>>>> Thanks for the update. I'm gonna re-read the whole thread and document
>>>> to have a better understanding.
>>>>
>>>> Thanks !
>>>> Regards
>>>> JB
>>>>
>>>> On Mon, Jun 17, 2024 at 7:44 PM Jia Yu <ji...@apache.org> wrote:
>>>>
>>>>> Hi Snowflake folks,
>>>>>
>>>>> Please let me know if you have other questions regarding the proposal.
>>>>> If any, Szehon and I can set up a zoom call with you guys to clarify some
>>>>> details. We are in the Pacific time zone. If you are in Europe, maybe 
>>>>> early
>>>>> morning Pacific Time works best for you?
>>>>>
>>>>> Thanks,
>>>>> Jia
>>>>>
>>>>> On Wed, Jun 5, 2024 at 6:28 PM Gang Wu <ust...@gmail.com> wrote:
>>>>>
>>>>>> > The min/max stats are discussed in the doc (Phase 2), depending on
>>>>>> the non-trivial encoding.
>>>>>>
>>>>>> Just want to add that min/max stats filtering could be supported by
>>>>>> file format natively. Adding geometry type to parquet spec
>>>>>> is under discussion:
>>>>>> https://github.com/apache/parquet-format/pull/240
>>>>>>
>>>>>> Best,
>>>>>> Gang
>>>>>>
>>>>>> On Thu, Jun 6, 2024 at 5:53 AM Szehon Ho <szehon.apa...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Peter
>>>>>>>
>>>>>>> Yes the document only concerns the predicate pushdown of geometric
>>>>>>> column.  Predicate pushdown takes two forms, 1) partition filter and 2)
>>>>>>> min/max stats.  The min/max stats are discussed in the doc (Phase 2),
>>>>>>> depending on the non-trivial encoding.
>>>>>>>
>>>>>>> The evaluators are always AND'ed together, so I dont see any issue
>>>>>>> of partitioning with another key not working on a table with a geo 
>>>>>>> column.
>>>>>>>
>>>>>>> On another note, Jia and I thought that we may have a discussion
>>>>>>> about Snowflake geo types in a call to drill down on some details?  What
>>>>>>> time zone are you folks in/ what time works better ?  I think Jia and I 
>>>>>>> are
>>>>>>> both in Pacific time zone.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Szehon
>>>>>>>
>>>>>>> On Wed, Jun 5, 2024 at 1:02 AM Peter Popov <
>>>>>>> peter.po...@snowflake.com> wrote:
>>>>>>>
>>>>>>>> Hi Szehon, hi Jia,
>>>>>>>>
>>>>>>>> Thank you for your replies. We now better understand the connection
>>>>>>>> between the metadata and partitioning in this proposal. Supporting the
>>>>>>>> Mapping 1 is a great starting point, and we would like to work closer 
>>>>>>>> with
>>>>>>>> you on bringing the support for spherical edges and other coordinate
>>>>>>>> systems into Iceberg geometry.
>>>>>>>>
>>>>>>>> We have some follow-up questions regarding the partitioning (let us
>>>>>>>> know if it’s better to comment directly in the document): Does this
>>>>>>>> proposal imply that XZ2 partitioning is always required? In the
>>>>>>>> current proposal, do you see a possibility of predicate pushdown
>>>>>>>> to rely on x/y min/max column metadata instead of a partition key? We 
>>>>>>>> see
>>>>>>>> use-cases where a table with a geo column can be partitioned by a 
>>>>>>>> different
>>>>>>>> key(e.g. date) or combination of keys. It would be great to support 
>>>>>>>> such
>>>>>>>> use cases from the very beginning.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> On Thu, May 30, 2024 at 8:07 AM Jia Yu <ji...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi Dmtro,
>>>>>>>>>
>>>>>>>>> Thanks for your email. To add to Szehon's answer,
>>>>>>>>>
>>>>>>>>> 1. How to represent Snowflake Geometry and Geography type in
>>>>>>>>> Iceberg, given the Geo Iceberg Phase 1 design:
>>>>>>>>>
>>>>>>>>> Answer:
>>>>>>>>> Mapping 1 (possible): Snowflake Geometry + SRID: 4326 -> Iceberg
>>>>>>>>> Geometry + CRS84 + edges: Planar
>>>>>>>>> Mapping 2 (impossible): Snowflake Geography -> Iceberg Geometry +
>>>>>>>>> CRS84 + edges: Spherical
>>>>>>>>> Mapping 3 (impossible): Snowflake Geometry + SRID:ABCDE-> Iceberg
>>>>>>>>> Geometry + SRID:ABCDE + edges: Planar
>>>>>>>>>
>>>>>>>>> As Szehon mentioned, only Mapping 1 is possible because we need to
>>>>>>>>> support spatial query push down in Iceberg. This function relies on 
>>>>>>>>> the
>>>>>>>>> Iceberg partition transform, which requires a 1:1 mapping between a 
>>>>>>>>> value
>>>>>>>>> (point/polygon/linestring) and a partition key. That is: given any
>>>>>>>>> precision level, a polygon must produce a single ID; and the covering
>>>>>>>>> indicated by this single ID must fully cover the extent of the 
>>>>>>>>> polygon.
>>>>>>>>> Currently, only xz2 can satisfy this requirement. If the theory from
>>>>>>>>> Michael Entin can be proven to be correct, then we can support 
>>>>>>>>> Mapping 2 in
>>>>>>>>> Phase 2 of Geo Iceberg.
>>>>>>>>>
>>>>>>>>> Regarding Mapping 3, this requires Iceberg to be able to
>>>>>>>>> understand SRID / PROJJSON such that we will know min max X Y of the 
>>>>>>>>> CRS
>>>>>>>>> (@Szehon, maybe Iceberg can ask the engine to provide this 
>>>>>>>>> information?).
>>>>>>>>> See my answer 2.
>>>>>>>>>
>>>>>>>>> 2. Why choose projjson instead of SRID?
>>>>>>>>>
>>>>>>>>> The projjson idea was borrowed from GeoParquet because we'd like
>>>>>>>>> to enable possible conversion between Geo Iceberg and GeoParquet. 
>>>>>>>>> However,
>>>>>>>>> I do understand that this is not a good idea for Iceberg since not 
>>>>>>>>> many
>>>>>>>>> libs can parse projjson.
>>>>>>>>>
>>>>>>>>> @Szehon Is there a way that we can support both SRID and PROJJSON
>>>>>>>>> in Geo Iceberg?
>>>>>>>>>
>>>>>>>>> It is also worth noting that, although there are many libs that
>>>>>>>>> can parse SRID and perform look-up in the EPSG database, the license 
>>>>>>>>> of the
>>>>>>>>> EPSG database is NOT compatible with the Apache Software Foundation. 
>>>>>>>>> That
>>>>>>>>> means: Iceberg still cannot parse / understand SRID.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Jia
>>>>>>>>>
>>>>>>>>> On Wed, May 29, 2024 at 11:08 AM Szehon Ho <
>>>>>>>>> szehon.apa...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Dmytro
>>>>>>>>>>
>>>>>>>>>> Thank you for looking through the proposal and excited to hear
>>>>>>>>>> from you guys!  I am not a 'geo expert' and I will definitely need 
>>>>>>>>>> to pull
>>>>>>>>>> in Jia Yu for some of these points.
>>>>>>>>>>
>>>>>>>>>> Although most calculations are done on the query engine, Iceberg
>>>>>>>>>> reference implementations (ie, Java, Python) does have to support a 
>>>>>>>>>> few
>>>>>>>>>> calculations to handle filter push down:
>>>>>>>>>>
>>>>>>>>>>    1. push down of the proposed Geospatial transforms ST_COVERS,
>>>>>>>>>>    ST_COVERED_BY, and ST_INTERSECTS
>>>>>>>>>>    2. evaluation of proposed Geospatial partition transform
>>>>>>>>>>    XZ2.  As you may have seen, this was chosen as its the only 
>>>>>>>>>> standard one
>>>>>>>>>>    today that solves the 'boundary object' problem, still preserving 
>>>>>>>>>> 1-to-1
>>>>>>>>>>    mapping of row => partition value.
>>>>>>>>>>
>>>>>>>>>> This is the primary rationale for choosing the values, as these
>>>>>>>>>> were implemented in the GeoLake and Havasu projects (Iceberg forks 
>>>>>>>>>> that
>>>>>>>>>> sparked the proposal) based on Geometry type (edge=planar, 
>>>>>>>>>> crs=OGC:CRS84/
>>>>>>>>>> SRID=4326).
>>>>>>>>>>
>>>>>>>>>> 2. As you mentioned [2] in the proposal there are difficulties
>>>>>>>>>>> with supporting the full PROJSSON specification of the SRS. From our
>>>>>>>>>>> experience most of the use-cases do not require the full definition 
>>>>>>>>>>> of the
>>>>>>>>>>> SRS, in fact that definition is only needed when converting between
>>>>>>>>>>> coordinate systems. On the other hand, it’s often needed to check 
>>>>>>>>>>> whether
>>>>>>>>>>> two geometry columns have the same coordinate system, for example 
>>>>>>>>>>> when
>>>>>>>>>>> joining two columns from different data providers.
>>>>>>>>>>>
>>>>>>>>>>> To address this we would like to propose including the option to
>>>>>>>>>>> specify the SRS with only a SRID in phase 1. The query engine may 
>>>>>>>>>>> choose to
>>>>>>>>>>> treat it as opaque identified or make a look-up in the EPSG 
>>>>>>>>>>> database of
>>>>>>>>>>> supported.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The way to specify CRS definition is actually taken from
>>>>>>>>>> GeoParquet [1], I think we are not bound to follow it if there are 
>>>>>>>>>> better
>>>>>>>>>> options.  I feel we might need to at least list out supported
>>>>>>>>>> configurations in the spec, though.  There is some conversation on 
>>>>>>>>>> the doc
>>>>>>>>>> here about this [2].  Basically:
>>>>>>>>>>
>>>>>>>>>>    1. XZ2 assumes planar edges.  This is a feature of the
>>>>>>>>>>    algorithm, based on the original paper.  A possible solution to 
>>>>>>>>>> spherical
>>>>>>>>>>    edge is proposed by Michael Entin here: [3], please feel free to 
>>>>>>>>>> evaluate.
>>>>>>>>>>    2. XZ2 needs to know the coordinate range.  According to
>>>>>>>>>>    Jia's comments, this needs parsing of the CRS.  Can it be done 
>>>>>>>>>> with SRID
>>>>>>>>>>    alone?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> 1. In the first version of the specification Phase1 it is
>>>>>>>>>>> mentioned as the version focused on the planar geometry model with 
>>>>>>>>>>> a CRS
>>>>>>>>>>> system fixed on 4326. In this model, Snowflake would not be able to 
>>>>>>>>>>> map our
>>>>>>>>>>> Geography type since it is based on the spherical Geography model. 
>>>>>>>>>>> Given
>>>>>>>>>>> that Snowflake supports both edge types, we would like to better 
>>>>>>>>>>> understand
>>>>>>>>>>> how to map them to the proposed Geometry type and its metadata.
>>>>>>>>>>>
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>    How is the edge type supposed to be interpreted by the query
>>>>>>>>>>>    engine? Is it necessary for the system to adhere to the edge 
>>>>>>>>>>> model for
>>>>>>>>>>>    geospatial functions, or can it use the model that it supports 
>>>>>>>>>>> or let the
>>>>>>>>>>>    customer choose it? Will it affect the bounding box or other row 
>>>>>>>>>>> group
>>>>>>>>>>>    metadata
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>    Is there any reason why the flexible model has to be
>>>>>>>>>>>    postponed to further iterations? Would it be more extensible to 
>>>>>>>>>>> support
>>>>>>>>>>>    mutable edge type from the Phase 1, but allow systems to ignore 
>>>>>>>>>>> it if they
>>>>>>>>>>>    do not support the spherical computation model
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> It may be answered by the previous paragraph in regards to XZ2.
>>>>>>>>>>
>>>>>>>>>>    1. If we get XZ2 to work with a more variable CRS without
>>>>>>>>>>    requiring full PROJJSON specification, it seems it is a path to 
>>>>>>>>>> support
>>>>>>>>>>    Snowflake Geometry type?
>>>>>>>>>>    2. If we get another one-to-one partition function on
>>>>>>>>>>    spherical edges, like the one proposed by Michael, it seems a 
>>>>>>>>>> path to
>>>>>>>>>>    support Snowflake Geography type?
>>>>>>>>>>
>>>>>>>>>> Does that sound correct?  As for why certain things are marked as
>>>>>>>>>> Phase 1, they are just chosen so we can all agree on an initial 
>>>>>>>>>> design and
>>>>>>>>>> iterate faster and not set in stone, maybe the path 1 is possible to 
>>>>>>>>>> do
>>>>>>>>>> quickly, for example.
>>>>>>>>>>
>>>>>>>>>> Also , I am not sure about handling evaluation of ST_COVERS,
>>>>>>>>>> ST_COVERED_BY, and ST_INTERSECTS (how easy to handle different CRS +
>>>>>>>>>> spherical edges).  I will leave it to Jia.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Szehon
>>>>>>>>>>
>>>>>>>>>> [1]:
>>>>>>>>>> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#column-metadata
>>>>>>>>>> [2]:
>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit?disco=AAABL-z6xXk
>>>>>>>>>> <https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit?disco=AAABL-z6xXk>
>>>>>>>>>> [3]:
>>>>>>>>>> https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit
>>>>>>>>>> <https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 29, 2024 at 8:30 AM Dmytro Koval
>>>>>>>>>> <dmytro.ko...@snowflake.com.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> Dear Szehon and Iceberg Community,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This is Dmytro, Peter, Aihua, and Tyler from Snowflake. As part
>>>>>>>>>>> of our desire to be more active in the Iceberg community, we’ve been
>>>>>>>>>>> looking over this geospatial proposal. We’re excited geospatial is 
>>>>>>>>>>> getting
>>>>>>>>>>> traction, as we see a lot of geo usage within Snowflake, and expect 
>>>>>>>>>>> that
>>>>>>>>>>> usage to carry over to our Iceberg offerings soon. After reviewing 
>>>>>>>>>>> the
>>>>>>>>>>> proposal, we have some questions we’d like to pose given our 
>>>>>>>>>>> experience
>>>>>>>>>>> with geospatial support in Snowflake.
>>>>>>>>>>>
>>>>>>>>>>> We would like to clarify two aspects of the proposal: handling
>>>>>>>>>>> of the spherical model and definition of the spatial reference 
>>>>>>>>>>> system. Both
>>>>>>>>>>> of which have a big impact on the interoperability with Snowflake 
>>>>>>>>>>> and other
>>>>>>>>>>> query engines and Geo processing systems.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Let us first share some context about geospatial types at
>>>>>>>>>>> Snowflake; geo experts will certainly be familiar with this context
>>>>>>>>>>> already, but for the sake of others we want to err on the side of 
>>>>>>>>>>> being
>>>>>>>>>>> explicit and clear. Snowflake supports two Geospatial types [1]:
>>>>>>>>>>> - Geography – uses a spherical approximation of the earth for
>>>>>>>>>>> all the computations. It does not perfectly represent the earth, 
>>>>>>>>>>> but allows
>>>>>>>>>>> getting accurate results on WGS84 coordinates, used by GPS without 
>>>>>>>>>>> any need
>>>>>>>>>>> to perform coordinate system reprojections. It is also quite fast 
>>>>>>>>>>> for
>>>>>>>>>>> end-to-end computations. In general, it has less distortions 
>>>>>>>>>>> compared to
>>>>>>>>>>> the 2d planar model .
>>>>>>>>>>> - Geometry – uses planar Euclidean geometry model. Geometric
>>>>>>>>>>> computations are simpler, but require transforming the data between
>>>>>>>>>>> coordinate systems to minimize the distortion. The Geometry data 
>>>>>>>>>>> type
>>>>>>>>>>> allows setting a spatial reference system for each row using the 
>>>>>>>>>>> SRID. The
>>>>>>>>>>> binary geospatial functions are only allowed on the geometries with 
>>>>>>>>>>> the
>>>>>>>>>>> same SRID. The only function that interprets SRID is ST_TRANFORM 
>>>>>>>>>>> that
>>>>>>>>>>> allows conversion between different SRSs.
>>>>>>>>>>>
>>>>>>>>>>> Geography
>>>>>>>>>>>
>>>>>>>>>>> Geometry
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Given the choice of two types and a set of operations on top of
>>>>>>>>>>> them, the majority of Snowflake users select the Geography type to
>>>>>>>>>>> represent their geospatial data.
>>>>>>>>>>>
>>>>>>>>>>> From our perspective, Iceberg users would benefit most from
>>>>>>>>>>> being given the flexibility to store and process data using the 
>>>>>>>>>>> model that
>>>>>>>>>>> better fits their needs and specific use cases.
>>>>>>>>>>>
>>>>>>>>>>> Therefore, we would like to ask some design clarifying
>>>>>>>>>>> questions, important for interoperability:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 1. In the first version of the specification Phase1 it is
>>>>>>>>>>> mentioned as the version focused on the planar geometry model with 
>>>>>>>>>>> a CRS
>>>>>>>>>>> system fixed on 4326. In this model, Snowflake would not be able to 
>>>>>>>>>>> map our
>>>>>>>>>>> Geography type since it is based on the spherical Geography model. 
>>>>>>>>>>> Given
>>>>>>>>>>> that Snowflake supports both edge types, we would like to better 
>>>>>>>>>>> understand
>>>>>>>>>>> how to map them to the proposed Geometry type and its metadata.
>>>>>>>>>>>
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>    How is the edge type supposed to be interpreted by the query
>>>>>>>>>>>    engine? Is it necessary for the system to adhere to the edge 
>>>>>>>>>>> model for
>>>>>>>>>>>    geospatial functions, or can it use the model that it supports 
>>>>>>>>>>> or let the
>>>>>>>>>>>    customer choose it? Will it affect the bounding box or other row 
>>>>>>>>>>> group
>>>>>>>>>>>    metadata
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>    Is there any reason why the flexible model has to be
>>>>>>>>>>>    postponed to further iterations? Would it be more extensible to 
>>>>>>>>>>> support
>>>>>>>>>>>    mutable edge type from the Phase 1, but allow systems to ignore 
>>>>>>>>>>> it if they
>>>>>>>>>>>    do not support the spherical computation model
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2. As you mentioned [2] in the proposal there are difficulties
>>>>>>>>>>> with supporting the full PROJSSON specification of the SRS. From our
>>>>>>>>>>> experience most of the use-cases do not require the full definition 
>>>>>>>>>>> of the
>>>>>>>>>>> SRS, in fact that definition is only needed when converting between
>>>>>>>>>>> coordinate systems. On the other hand, it’s often needed to check 
>>>>>>>>>>> whether
>>>>>>>>>>> two geometry columns have the same coordinate system, for example 
>>>>>>>>>>> when
>>>>>>>>>>> joining two columns from different data providers.
>>>>>>>>>>>
>>>>>>>>>>> To address this we would like to propose including the option to
>>>>>>>>>>> specify the SRS with only a SRID in phase 1. The query engine may 
>>>>>>>>>>> choose to
>>>>>>>>>>> treat it as opaque identified or make a look-up in the EPSG 
>>>>>>>>>>> database of
>>>>>>>>>>> supported.
>>>>>>>>>>>
>>>>>>>>>>> Thank you again for driving this effort forward. We look forward
>>>>>>>>>>> to hearing your thoughts.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://docs.snowflake.com/en/sql-reference/data-types-geospatial#understanding-the-differences-between-geography-and-geometry
>>>>>>>>>>>
>>>>>>>>>>> [2]
>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit#heading=h.oruaqt3nxcaf
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2024/05/02 00:41:52 Szehon Ho wrote:
>>>>>>>>>>> > Hi everyone,
>>>>>>>>>>> >
>>>>>>>>>>> > We have created a formal proposal for adding Geospatial
>>>>>>>>>>> support to Iceberg.
>>>>>>>>>>> >
>>>>>>>>>>> > Please read the following for details.
>>>>>>>>>>> >
>>>>>>>>>>> >    - Github Proposal :
>>>>>>>>>>> https://github.com/apache/iceberg/issues/10260
>>>>>>>>>>> >    - Proposal Doc:
>>>>>>>>>>> >
>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > Note that this proposal is built on existing extensive
>>>>>>>>>>> research and POC
>>>>>>>>>>> > implementations (Geolake, Havasu).  Special thanks to Jia Yu
>>>>>>>>>>> and Kristin
>>>>>>>>>>> > Cowalcijk from Wherobots/Geolake for extensive consultation
>>>>>>>>>>> and help in
>>>>>>>>>>> > writing this proposal, as well as support from Yuanyuan Zhang
>>>>>>>>>>> from Geolake.
>>>>>>>>>>> >
>>>>>>>>>>> > We would love to get more feedback for this proposal from the
>>>>>>>>>>> wider
>>>>>>>>>>> > community and eventually discuss this in a community sync.
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks
>>>>>>>>>>> > Szehon
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>

Re: [Discuss] Geospatial Support

Reply via email to