Thanks for the detailed response 🙂 I think Ryan's point in the referenced issue is important - having a set of transforms would be important in order to have consistent support across engines.
Partition transforms would indeed have to do most of the heavy lifting in order to simplify the query plans. The table partitions should at the very least have the bounding box of the geospatial data contained but having a bounding box for every geospatial value could also make sense from a performance-perspective. Thomas Li Fredriksen Lead Solution Architect p +47 452 21 055 [cid:66c88610-f58b-4c2f-bae0-84f007752fff] ––––– www.hubocean.earth<http://www.hubocean.earth> ________________________________ From: Walaa Eldin Moustafa <wa.moust...@gmail.com> Sent: Thursday, October 27, 2022 19:46 To: dev@iceberg.apache.org <dev@iceberg.apache.org> Subject: Re: Geospatial/geometry support Types, as in "POINT", etc? No, the point was to just express them as complex types to avoid adding them to Iceberg spec and the engines (because even if they were added to Iceberg spec, engines will likely not have them as first class citizens anyways), i.e., their POINT/geometry semantics are invisible to Iceberg, and are just interpretable by the application. On Thu, Oct 27, 2022 at 10:08 AM Ryan Blue <b...@tabular.io<mailto:b...@tabular.io>> wrote: Walaa, How are those types defined? Would we need to have definitions in the Iceberg spec? Ryan On Thu, Oct 27, 2022 at 9:47 AM Walaa Eldin Moustafa <wa.moust...@gmail.com<mailto:wa.moust...@gmail.com>> wrote: Thanks Ryan! To expand a bit more: For representation, I was thinking that geometry types could be expressed as complex types (e.g., POINTs as Structs), so they are compatible with all engines without having to introduce user-defined types in both Iceberg and compute engines. For the partitioning: (1) Custom partition functions could directly operate on complex types (e.g., structs representing POINTs). In this case the partitioning function is like: geometry_hash(strcut_col); or (2) Partitioning spec could be extended to allow "generated columns" to be sources of partition functions, so a "generated" WKB column can be the intermediate representation between complex geometry types and partition functions that accept primitive types. In this case, the partitioning function is like hashBytes(wlb(struct_col)). Thanks, Walaa. On Thu, Oct 27, 2022 at 8:46 AM Ryan Blue <b...@tabular.io<mailto:b...@tabular.io>> wrote: Thomas, thanks for taking the time to put this together! I've always wanted geospatial support in the format, but thought that it would be best to have an expert design and build it with us so we don't get it wrong. I think Walaa is right about the approach. We want to use partition transforms to do the heavy lifting of finding the right files for a query. That means that we'd need some clear but generic definition of geospatial objects in the data, along with more specific attributes. At a high level, I think that's probably done by storing each object using a standard envelope definition (bbox?) that we can use in partition transforms, and then a WKB column for the actual object. What do you think? Ryan On Thu, Oct 27, 2022 at 4:03 AM Walaa Eldin Moustafa <wa.moust...@gmail.com<mailto:wa.moust...@gmail.com>> wrote: > > Hi Thomas, > > It sounds what you are trying to achieve is to provide a custom partition > function? There is some discussion here > https://github.com/apache/iceberg/issues/1482<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fissues%2F1482&data=05%7C01%7Cthomas.fredriksen%40oceandata.earth%7C4d889798f0b544d556e008dab843375a%7C4532deeec4ed44d788c679ffa513472c%7C0%7C0%7C638024896087511678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=t%2FEIoXNzAQUmc0JIvqCLeZdZ%2B%2BGkRf8xjb2cWsP%2FY9A%3D&reserved=0>. > I guess supporting geometry through this framework makes more sense since it > does not require extending the Iceberg type system, yet general enough to > support other applications. > > Thanks, > Walaa. > > On Thu, Oct 27, 2022 at 12:33 AM Thomas Fredriksen > <thomas.fredriksen@oceandata.earth> wrote: >> >> Hello everyone, >> >> I am working big geospatial and trying to solve very large tables in object >> storage. Iceberg appear to be the ideal solution but does unfortunately not >> appear to support geometry columns. >> >> The way that iceberg is structured, it appears to be a good fit with the >> GeoParquet-standard >> (https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopengeospatial%2Fgeoparquet%2Fblob%2Fmain%2Fformat-specs%2Fgeoparquet.md&data=05%7C01%7Cthomas.fredriksen%40oceandata.earth%7C4d889798f0b544d556e008dab843375a%7C4532deeec4ed44d788c679ffa513472c%7C0%7C0%7C638024896087511678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=V%2F5%2B8mGCyUws10ZuuSPMH%2FI8WKpcE%2FwtyasePtHAWyU%3D&reserved=0>), >> so I created a pull request where I attempt to add this support: >> https://github.com/apache/iceberg/pull/6062<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F6062&data=05%7C01%7Cthomas.fredriksen%40oceandata.earth%7C4d889798f0b544d556e008dab843375a%7C4532deeec4ed44d788c679ffa513472c%7C0%7C0%7C638024896087511678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fT1tXHllxVgXsQzNsNXq2llmHz7k%2FBrx2Gmicx2KYDk%3D&reserved=0> >> >> The PR deviates from GeoParquet in the CRS-field of the column metadata. >> GeoParquet requires the CRS to be defined as a PROJJSON JSON object, while >> the PR simply asks the user to specify and EPSG ID, where EPSG:4326 (WGS84 - >> latitude/longitude) is considered default. >> >> I would love feedback on the PR and welcome the discussion on whether >> geospatial/geometry belongs in the iceberg standard. >> >> Thomas Li Fredriksen >> Lead Solution Architect >> >> p +47 452 21 055 >> >> >> ––––– >> >> www.hubocean.earth -- Ryan Blue Tabular -- Ryan Blue Tabular