I have a couple of comments that I'd like to see addressed.

First, I think that the definition of the bounding box needs to be more
clear: the bounding box must include all points that lie on an object's
edges or within an object. If that isn't required then we can't use the
bounding box for filtering because there may be points outside the box that
are part of the object.

Second, the encoding for lower and upper bound points needs to be a little
more specific about the binary and how to handle the optional values.

On Mon, Sep 30, 2024 at 1:30 PM Yufei Gu <flyrain...@gmail.com> wrote:

> Thanks Szehon! My comments were addressed. I'm ready to vote.
>
> Yufei
>
>
> On Mon, Sep 30, 2024 at 11:47 AM Russell Spitzer <
> russell.spit...@gmail.com> wrote:
>
>> All my concerns are addressed, I'm ready to vote.
>>
>> On Mon, Sep 30, 2024 at 1:21 PM Szehon Ho <szehon.apa...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> There have been several rounds of discussion on the PR:
>>> https://github.com/apache/iceberg/pull/10981 and I think most of the
>>> main points have been addressed.
>>>
>>> If anyone is interested, please take a look.  If there are no other
>>> major points, we plan to start a VOTE thread soon.
>>>
>>> I know Jia and team are also volunteering to work on the prototype
>>> immediately afterwards.
>>>
>>> Thank you,
>>> Szehon
>>>
>>> On Tue, Aug 20, 2024 at 1:57 PM Szehon Ho <szehon.apa...@gmail.com>
>>> wrote:
>>>
>>>> Hi all
>>>>
>>>> Please take a look at the proposed spec change to support Geo type for
>>>> V3 in : https://github.com/apache/iceberg/pull/10981, and comment or
>>>> otherwise let me know your thoughts.
>>>>
>>>> Just as an FYI it incorporated the feedback from our last meeting (with
>>>> Snowflake and Wherobots engineers).
>>>>
>>>> Thanks,
>>>> Szehon
>>>>
>>>> On Wed, Jun 26, 2024 at 7:29 PM Szehon Ho <szehon.apa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> It was great to meet in person with Snowflake engineers and we had a
>>>>> good discussion on the paths forward.
>>>>>
>>>>> Meeting notes for Snowflake- Iceberg sync.
>>>>>
>>>>>    - Iceberg proposed Geometry type defaults to (edges=planar ,
>>>>>    crs=CRS84).
>>>>>    - Snowflake has two types Geography (spherical) and Geometry
>>>>>    (planar, with customizable CRS).  The data layout/encoding is the same 
>>>>> for
>>>>>    both types.  Let's see how we can support each in Iceberg type, 
>>>>> especially
>>>>>    wrt Iceberg partition/file pruning
>>>>>    - Geography type support
>>>>>    - Main concern is the need for a suitable partition transform for
>>>>>       partition-level filter, the candidate is Micahel Entin's
>>>>>       proposal
>>>>>       
>>>>> <https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit>
>>>>>       .
>>>>>       - Secondary concern is file and RG-level filtering.  Gang's Parquet
>>>>>       proposal
>>>>>       <https://github.com/apache/parquet-format/pull/240/files> allow
>>>>>       storage of S2 / H3 ID's in Parquet stats, and so we can also 
>>>>> leverage that
>>>>>       in Iceberg pruning code (Google and Uber libraries are compatible)
>>>>>    - Geometry type support
>>>>>       -  Main concern is partition transform needs to understand CRS,
>>>>>       but this can be solved by having XZ2 transform created with 
>>>>> customizable
>>>>>       min/max lat/long range (its all it needs)
>>>>>    - Should (CRS, edges) be stored properties on Geography type in
>>>>>    Phase 1?
>>>>>       - Should be fine to store, with only allowing defaults in Phase
>>>>>       1.
>>>>>       - Concern 1: If edges is stored, there will be ask to store
>>>>>       other properties like (orientation, epoch).  Solution is to punt 
>>>>> these
>>>>>       follow-on properties for later.
>>>>>       - Concern 2: if crs is stored, what format?  PROJJSON vs SRID.
>>>>>       Solution is to leave it as a string
>>>>>       - Concern 3: if crs is stored as a string, Iceberg cannot read
>>>>>       it.  This should be ok, as we only need this for XZ2 transform, 
>>>>> where the
>>>>>       user already passes in the info from CRS (up to user to make sure 
>>>>> these
>>>>>       align).
>>>>>
>>>>> Thanks
>>>>> Szehon
>>>>>
>>>>> On Tue, Jun 18, 2024 at 12:23 PM Szehon Ho <szehon.apa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Jia and I will sync with the Snowflake folks to see if we can have a
>>>>>> solution, or roadmap to solution, in the proposal.
>>>>>>
>>>>>> Thanks JB for the interest!  By the way, I want to schedule a meeting
>>>>>> to go over the proposal, it seems there's good feedback from folks from 
>>>>>> geo
>>>>>> side (and even Parquet community), but not too many eyes/feedback from
>>>>>> other folks/PMC on Iceberg community.  This might be due to lack of
>>>>>> familiarity/ time to read through it all.  In fact, a lot of the advanced
>>>>>> discussions like this one are for Phase 2 items, and Phase 1 items are
>>>>>> relatively straightforward, so wanted to explain that.  As I know its
>>>>>> summer vacation for some folks, we can do this in a week or early July,
>>>>>> hope that sounds good with everyone.
>>>>>>
>>>>>> Thanks,
>>>>>> Szehon
>>>>>>
>>>>>> On Tue, Jun 18, 2024 at 1:54 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Jia
>>>>>>>
>>>>>>> Thanks for the update. I'm gonna re-read the whole thread and
>>>>>>> document to have a better understanding.
>>>>>>>
>>>>>>> Thanks !
>>>>>>> Regards
>>>>>>> JB
>>>>>>>
>>>>>>> On Mon, Jun 17, 2024 at 7:44 PM Jia Yu <ji...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi Snowflake folks,
>>>>>>>>
>>>>>>>> Please let me know if you have other questions regarding the
>>>>>>>> proposal. If any, Szehon and I can set up a zoom call with you guys to
>>>>>>>> clarify some details. We are in the Pacific time zone. If you are in
>>>>>>>> Europe, maybe early morning Pacific Time works best for you?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jia
>>>>>>>>
>>>>>>>> On Wed, Jun 5, 2024 at 6:28 PM Gang Wu <ust...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> > The min/max stats are discussed in the doc (Phase 2), depending
>>>>>>>>> on the non-trivial encoding.
>>>>>>>>>
>>>>>>>>> Just want to add that min/max stats filtering could be supported
>>>>>>>>> by file format natively. Adding geometry type to parquet spec
>>>>>>>>> is under discussion:
>>>>>>>>> https://github.com/apache/parquet-format/pull/240
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Gang
>>>>>>>>>
>>>>>>>>> On Thu, Jun 6, 2024 at 5:53 AM Szehon Ho <szehon.apa...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Peter
>>>>>>>>>>
>>>>>>>>>> Yes the document only concerns the predicate pushdown of
>>>>>>>>>> geometric column.  Predicate pushdown takes two forms, 1) partition 
>>>>>>>>>> filter
>>>>>>>>>> and 2) min/max stats.  The min/max stats are discussed in the doc 
>>>>>>>>>> (Phase
>>>>>>>>>> 2), depending on the non-trivial encoding.
>>>>>>>>>>
>>>>>>>>>> The evaluators are always AND'ed together, so I dont see any
>>>>>>>>>> issue of partitioning with another key not working on a table with a 
>>>>>>>>>> geo
>>>>>>>>>> column.
>>>>>>>>>>
>>>>>>>>>> On another note, Jia and I thought that we may have a discussion
>>>>>>>>>> about Snowflake geo types in a call to drill down on some details?  
>>>>>>>>>> What
>>>>>>>>>> time zone are you folks in/ what time works better ?  I think Jia 
>>>>>>>>>> and I are
>>>>>>>>>> both in Pacific time zone.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Szehon
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 5, 2024 at 1:02 AM Peter Popov <
>>>>>>>>>> peter.po...@snowflake.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Szehon, hi Jia,
>>>>>>>>>>>
>>>>>>>>>>> Thank you for your replies. We now better understand the
>>>>>>>>>>> connection between the metadata and partitioning in this proposal.
>>>>>>>>>>> Supporting the Mapping 1 is a great starting point, and we would 
>>>>>>>>>>> like to
>>>>>>>>>>> work closer with you on bringing the support for spherical edges 
>>>>>>>>>>> and other
>>>>>>>>>>> coordinate systems into Iceberg geometry.
>>>>>>>>>>>
>>>>>>>>>>> We have some follow-up questions regarding the partitioning (let
>>>>>>>>>>> us know if it’s better to comment directly in the document): Does 
>>>>>>>>>>> this
>>>>>>>>>>> proposal imply that XZ2 partitioning is always required? In the
>>>>>>>>>>> current proposal, do you see a possibility of predicate
>>>>>>>>>>> pushdown to rely on x/y min/max column metadata instead of a 
>>>>>>>>>>> partition key?
>>>>>>>>>>> We see use-cases where a table with a geo column can be partitioned 
>>>>>>>>>>> by a
>>>>>>>>>>> different key(e.g. date) or combination of keys. It would be great 
>>>>>>>>>>> to
>>>>>>>>>>> support such use cases from the very beginning.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 30, 2024 at 8:07 AM Jia Yu <ji...@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Dmtro,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your email. To add to Szehon's answer,
>>>>>>>>>>>>
>>>>>>>>>>>> 1. How to represent Snowflake Geometry and Geography type in
>>>>>>>>>>>> Iceberg, given the Geo Iceberg Phase 1 design:
>>>>>>>>>>>>
>>>>>>>>>>>> Answer:
>>>>>>>>>>>> Mapping 1 (possible): Snowflake Geometry + SRID: 4326 ->
>>>>>>>>>>>> Iceberg Geometry + CRS84 + edges: Planar
>>>>>>>>>>>> Mapping 2 (impossible): Snowflake Geography -> Iceberg
>>>>>>>>>>>> Geometry + CRS84 + edges: Spherical
>>>>>>>>>>>> Mapping 3 (impossible): Snowflake Geometry + SRID:ABCDE->
>>>>>>>>>>>> Iceberg Geometry + SRID:ABCDE + edges: Planar
>>>>>>>>>>>>
>>>>>>>>>>>> As Szehon mentioned, only Mapping 1 is possible because we need
>>>>>>>>>>>> to support spatial query push down in Iceberg. This function 
>>>>>>>>>>>> relies on the
>>>>>>>>>>>> Iceberg partition transform, which requires a 1:1 mapping between 
>>>>>>>>>>>> a value
>>>>>>>>>>>> (point/polygon/linestring) and a partition key. That is: given any
>>>>>>>>>>>> precision level, a polygon must produce a single ID; and the 
>>>>>>>>>>>> covering
>>>>>>>>>>>> indicated by this single ID must fully cover the extent of the 
>>>>>>>>>>>> polygon.
>>>>>>>>>>>> Currently, only xz2 can satisfy this requirement. If the theory 
>>>>>>>>>>>> from
>>>>>>>>>>>> Michael Entin can be proven to be correct, then we can support 
>>>>>>>>>>>> Mapping 2 in
>>>>>>>>>>>> Phase 2 of Geo Iceberg.
>>>>>>>>>>>>
>>>>>>>>>>>> Regarding Mapping 3, this requires Iceberg to be able to
>>>>>>>>>>>> understand SRID / PROJJSON such that we will know min max X Y of 
>>>>>>>>>>>> the CRS
>>>>>>>>>>>> (@Szehon, maybe Iceberg can ask the engine to provide this 
>>>>>>>>>>>> information?).
>>>>>>>>>>>> See my answer 2.
>>>>>>>>>>>>
>>>>>>>>>>>> 2. Why choose projjson instead of SRID?
>>>>>>>>>>>>
>>>>>>>>>>>> The projjson idea was borrowed from GeoParquet because we'd
>>>>>>>>>>>> like to enable possible conversion between Geo Iceberg and 
>>>>>>>>>>>> GeoParquet.
>>>>>>>>>>>> However, I do understand that this is not a good idea for Iceberg 
>>>>>>>>>>>> since not
>>>>>>>>>>>> many libs can parse projjson.
>>>>>>>>>>>>
>>>>>>>>>>>> @Szehon Is there a way that we can support both SRID and
>>>>>>>>>>>> PROJJSON in Geo Iceberg?
>>>>>>>>>>>>
>>>>>>>>>>>> It is also worth noting that, although there are many libs that
>>>>>>>>>>>> can parse SRID and perform look-up in the EPSG database, the 
>>>>>>>>>>>> license of the
>>>>>>>>>>>> EPSG database is NOT compatible with the Apache Software 
>>>>>>>>>>>> Foundation. That
>>>>>>>>>>>> means: Iceberg still cannot parse / understand SRID.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Jia
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 29, 2024 at 11:08 AM Szehon Ho <
>>>>>>>>>>>> szehon.apa...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Dmytro
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you for looking through the proposal and excited to hear
>>>>>>>>>>>>> from you guys!  I am not a 'geo expert' and I will definitely 
>>>>>>>>>>>>> need to pull
>>>>>>>>>>>>> in Jia Yu for some of these points.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Although most calculations are done on the query engine,
>>>>>>>>>>>>> Iceberg reference implementations (ie, Java, Python) does have to 
>>>>>>>>>>>>> support a
>>>>>>>>>>>>> few calculations to handle filter push down:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. push down of the proposed Geospatial transforms
>>>>>>>>>>>>>    ST_COVERS, ST_COVERED_BY, and ST_INTERSECTS
>>>>>>>>>>>>>    2. evaluation of proposed Geospatial partition transform
>>>>>>>>>>>>>    XZ2.  As you may have seen, this was chosen as its the only 
>>>>>>>>>>>>> standard one
>>>>>>>>>>>>>    today that solves the 'boundary object' problem, still 
>>>>>>>>>>>>> preserving 1-to-1
>>>>>>>>>>>>>    mapping of row => partition value.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is the primary rationale for choosing the values, as
>>>>>>>>>>>>> these were implemented in the GeoLake and Havasu projects 
>>>>>>>>>>>>> (Iceberg forks
>>>>>>>>>>>>> that sparked the proposal) based on Geometry type (edge=planar,
>>>>>>>>>>>>> crs=OGC:CRS84/ SRID=4326).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. As you mentioned [2] in the proposal there are difficulties
>>>>>>>>>>>>>> with supporting the full PROJSSON specification of the SRS. 
>>>>>>>>>>>>>> >From our
>>>>>>>>>>>>>> experience most of the use-cases do not require the full 
>>>>>>>>>>>>>> definition of the
>>>>>>>>>>>>>> SRS, in fact that definition is only needed when converting 
>>>>>>>>>>>>>> between
>>>>>>>>>>>>>> coordinate systems. On the other hand, it’s often needed to 
>>>>>>>>>>>>>> check whether
>>>>>>>>>>>>>> two geometry columns have the same coordinate system, for 
>>>>>>>>>>>>>> example when
>>>>>>>>>>>>>> joining two columns from different data providers.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To address this we would like to propose including the option
>>>>>>>>>>>>>> to specify the SRS with only a SRID in phase 1. The query engine 
>>>>>>>>>>>>>> may choose
>>>>>>>>>>>>>> to treat it as opaque identified or make a look-up in the EPSG 
>>>>>>>>>>>>>> database of
>>>>>>>>>>>>>> supported.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The way to specify CRS definition is actually taken from
>>>>>>>>>>>>> GeoParquet [1], I think we are not bound to follow it if there 
>>>>>>>>>>>>> are better
>>>>>>>>>>>>> options.  I feel we might need to at least list out supported
>>>>>>>>>>>>> configurations in the spec, though.  There is some conversation 
>>>>>>>>>>>>> on the doc
>>>>>>>>>>>>> here about this [2].  Basically:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. XZ2 assumes planar edges.  This is a feature of the
>>>>>>>>>>>>>    algorithm, based on the original paper.  A possible solution 
>>>>>>>>>>>>> to spherical
>>>>>>>>>>>>>    edge is proposed by Michael Entin here: [3], please feel free 
>>>>>>>>>>>>> to evaluate.
>>>>>>>>>>>>>    2. XZ2 needs to know the coordinate range.  According to
>>>>>>>>>>>>>    Jia's comments, this needs parsing of the CRS.  Can it be done 
>>>>>>>>>>>>> with SRID
>>>>>>>>>>>>>    alone?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. In the first version of the specification Phase1 it is
>>>>>>>>>>>>>> mentioned as the version focused on the planar geometry model 
>>>>>>>>>>>>>> with a CRS
>>>>>>>>>>>>>> system fixed on 4326. In this model, Snowflake would not be able 
>>>>>>>>>>>>>> to map our
>>>>>>>>>>>>>> Geography type since it is based on the spherical Geography 
>>>>>>>>>>>>>> model. Given
>>>>>>>>>>>>>> that Snowflake supports both edge types, we would like to better 
>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>> how to map them to the proposed Geometry type and its metadata.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    How is the edge type supposed to be interpreted by the
>>>>>>>>>>>>>>    query engine? Is it necessary for the system to adhere to the 
>>>>>>>>>>>>>> edge model
>>>>>>>>>>>>>>    for geospatial functions, or can it use the model that it 
>>>>>>>>>>>>>> supports or let
>>>>>>>>>>>>>>    the customer choose it? Will it affect the bounding box or 
>>>>>>>>>>>>>> other row group
>>>>>>>>>>>>>>    metadata
>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Is there any reason why the flexible model has to be
>>>>>>>>>>>>>>    postponed to further iterations? Would it be more extensible 
>>>>>>>>>>>>>> to support
>>>>>>>>>>>>>>    mutable edge type from the Phase 1, but allow systems to 
>>>>>>>>>>>>>> ignore it if they
>>>>>>>>>>>>>>    do not support the spherical computation model
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> It may be answered by the previous paragraph in regards to
>>>>>>>>>>>>> XZ2.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. If we get XZ2 to work with a more variable CRS without
>>>>>>>>>>>>>    requiring full PROJJSON specification, it seems it is a path 
>>>>>>>>>>>>> to support
>>>>>>>>>>>>>    Snowflake Geometry type?
>>>>>>>>>>>>>    2. If we get another one-to-one partition function on
>>>>>>>>>>>>>    spherical edges, like the one proposed by Michael, it seems a 
>>>>>>>>>>>>> path to
>>>>>>>>>>>>>    support Snowflake Geography type?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does that sound correct?  As for why certain things are marked
>>>>>>>>>>>>> as Phase 1, they are just chosen so we can all agree on an 
>>>>>>>>>>>>> initial design
>>>>>>>>>>>>> and iterate faster and not set in stone, maybe the path 1 is 
>>>>>>>>>>>>> possible to do
>>>>>>>>>>>>> quickly, for example.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also , I am not sure about handling evaluation of ST_COVERS,
>>>>>>>>>>>>> ST_COVERED_BY, and ST_INTERSECTS (how easy to handle different 
>>>>>>>>>>>>> CRS +
>>>>>>>>>>>>> spherical edges).  I will leave it to Jia.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> Szehon
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#column-metadata
>>>>>>>>>>>>> [2]:
>>>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit?disco=AAABL-z6xXk
>>>>>>>>>>>>> <https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit?disco=AAABL-z6xXk>
>>>>>>>>>>>>> [3]:
>>>>>>>>>>>>> https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit
>>>>>>>>>>>>> <https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 29, 2024 at 8:30 AM Dmytro Koval
>>>>>>>>>>>>> <dmytro.ko...@snowflake.com.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dear Szehon and Iceberg Community,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is Dmytro, Peter, Aihua, and Tyler from Snowflake. As
>>>>>>>>>>>>>> part of our desire to be more active in the Iceberg community, 
>>>>>>>>>>>>>> we’ve been
>>>>>>>>>>>>>> looking over this geospatial proposal. We’re excited geospatial 
>>>>>>>>>>>>>> is getting
>>>>>>>>>>>>>> traction, as we see a lot of geo usage within Snowflake, and 
>>>>>>>>>>>>>> expect that
>>>>>>>>>>>>>> usage to carry over to our Iceberg offerings soon. After 
>>>>>>>>>>>>>> reviewing the
>>>>>>>>>>>>>> proposal, we have some questions we’d like to pose given our 
>>>>>>>>>>>>>> experience
>>>>>>>>>>>>>> with geospatial support in Snowflake.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We would like to clarify two aspects of the proposal:
>>>>>>>>>>>>>> handling of the spherical model and definition of the spatial 
>>>>>>>>>>>>>> reference
>>>>>>>>>>>>>> system. Both of which have a big impact on the interoperability 
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>> Snowflake and other query engines and Geo processing systems.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let us first share some context about geospatial types at
>>>>>>>>>>>>>> Snowflake; geo experts will certainly be familiar with this 
>>>>>>>>>>>>>> context
>>>>>>>>>>>>>> already, but for the sake of others we want to err on the side 
>>>>>>>>>>>>>> of being
>>>>>>>>>>>>>> explicit and clear. Snowflake supports two Geospatial types [1]:
>>>>>>>>>>>>>> - Geography – uses a spherical approximation of the earth
>>>>>>>>>>>>>> for all the computations. It does not perfectly represent the 
>>>>>>>>>>>>>> earth, but
>>>>>>>>>>>>>> allows getting accurate results on WGS84 coordinates, used by 
>>>>>>>>>>>>>> GPS without
>>>>>>>>>>>>>> any need to perform coordinate system reprojections. It is also 
>>>>>>>>>>>>>> quite fast
>>>>>>>>>>>>>> for end-to-end computations. In general, it has less distortions 
>>>>>>>>>>>>>> compared
>>>>>>>>>>>>>> to the 2d planar model .
>>>>>>>>>>>>>> - Geometry – uses planar Euclidean geometry model. Geometric
>>>>>>>>>>>>>> computations are simpler, but require transforming the data 
>>>>>>>>>>>>>> between
>>>>>>>>>>>>>> coordinate systems to minimize the distortion. The Geometry data 
>>>>>>>>>>>>>> type
>>>>>>>>>>>>>> allows setting a spatial reference system for each row using the 
>>>>>>>>>>>>>> SRID. The
>>>>>>>>>>>>>> binary geospatial functions are only allowed on the geometries 
>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>> same SRID. The only function that interprets SRID is ST_TRANFORM 
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> allows conversion between different SRSs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Geography
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Geometry
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Given the choice of two types and a set of operations on top
>>>>>>>>>>>>>> of them, the majority of Snowflake users select the Geography 
>>>>>>>>>>>>>> type to
>>>>>>>>>>>>>> represent their geospatial data.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From our perspective, Iceberg users would benefit most from
>>>>>>>>>>>>>> being given the flexibility to store and process data using the 
>>>>>>>>>>>>>> model that
>>>>>>>>>>>>>> better fits their needs and specific use cases.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Therefore, we would like to ask some design clarifying
>>>>>>>>>>>>>> questions, important for interoperability:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. In the first version of the specification Phase1 it is
>>>>>>>>>>>>>> mentioned as the version focused on the planar geometry model 
>>>>>>>>>>>>>> with a CRS
>>>>>>>>>>>>>> system fixed on 4326. In this model, Snowflake would not be able 
>>>>>>>>>>>>>> to map our
>>>>>>>>>>>>>> Geography type since it is based on the spherical Geography 
>>>>>>>>>>>>>> model. Given
>>>>>>>>>>>>>> that Snowflake supports both edge types, we would like to better 
>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>> how to map them to the proposed Geometry type and its metadata.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    How is the edge type supposed to be interpreted by the
>>>>>>>>>>>>>>    query engine? Is it necessary for the system to adhere to the 
>>>>>>>>>>>>>> edge model
>>>>>>>>>>>>>>    for geospatial functions, or can it use the model that it 
>>>>>>>>>>>>>> supports or let
>>>>>>>>>>>>>>    the customer choose it? Will it affect the bounding box or 
>>>>>>>>>>>>>> other row group
>>>>>>>>>>>>>>    metadata
>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Is there any reason why the flexible model has to be
>>>>>>>>>>>>>>    postponed to further iterations? Would it be more extensible 
>>>>>>>>>>>>>> to support
>>>>>>>>>>>>>>    mutable edge type from the Phase 1, but allow systems to 
>>>>>>>>>>>>>> ignore it if they
>>>>>>>>>>>>>>    do not support the spherical computation model
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. As you mentioned [2] in the proposal there are
>>>>>>>>>>>>>> difficulties with supporting the full PROJSSON specification of 
>>>>>>>>>>>>>> the SRS.
>>>>>>>>>>>>>> From our experience most of the use-cases do not require the full
>>>>>>>>>>>>>> definition of the SRS, in fact that definition is only needed 
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>> converting between coordinate systems. On the other hand, it’s 
>>>>>>>>>>>>>> often needed
>>>>>>>>>>>>>> to check whether two geometry columns have the same coordinate 
>>>>>>>>>>>>>> system, for
>>>>>>>>>>>>>> example when joining two columns from different data providers.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To address this we would like to propose including the option
>>>>>>>>>>>>>> to specify the SRS with only a SRID in phase 1. The query engine 
>>>>>>>>>>>>>> may choose
>>>>>>>>>>>>>> to treat it as opaque identified or make a look-up in the EPSG 
>>>>>>>>>>>>>> database of
>>>>>>>>>>>>>> supported.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you again for driving this effort forward. We look
>>>>>>>>>>>>>> forward to hearing your thoughts.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> https://docs.snowflake.com/en/sql-reference/data-types-geospatial#understanding-the-differences-between-geography-and-geometry
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit#heading=h.oruaqt3nxcaf
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2024/05/02 00:41:52 Szehon Ho wrote:
>>>>>>>>>>>>>> > Hi everyone,
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > We have created a formal proposal for adding Geospatial
>>>>>>>>>>>>>> support to Iceberg.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Please read the following for details.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >    - Github Proposal :
>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10260
>>>>>>>>>>>>>> >    - Proposal Doc:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Note that this proposal is built on existing extensive
>>>>>>>>>>>>>> research and POC
>>>>>>>>>>>>>> > implementations (Geolake, Havasu).  Special thanks to Jia
>>>>>>>>>>>>>> Yu and Kristin
>>>>>>>>>>>>>> > Cowalcijk from Wherobots/Geolake for extensive consultation
>>>>>>>>>>>>>> and help in
>>>>>>>>>>>>>> > writing this proposal, as well as support from Yuanyuan
>>>>>>>>>>>>>> Zhang from Geolake.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > We would love to get more feedback for this proposal from
>>>>>>>>>>>>>> the wider
>>>>>>>>>>>>>> > community and eventually discuss this in a community sync.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Thanks
>>>>>>>>>>>>>> > Szehon
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Reply via email to