Types, as in "POINT", etc? No, the point was to just express them as complex types to avoid adding them to Iceberg spec and the engines (because even if they were added to Iceberg spec, engines will likely not have them as first class citizens anyways), i.e., their POINT/geometry semantics are invisible to Iceberg, and are just interpretable by the application.
On Thu, Oct 27, 2022 at 10:08 AM Ryan Blue <b...@tabular.io> wrote: > Walaa, > > How are those types defined? Would we need to have definitions in the > Iceberg spec? > > Ryan > > > On Thu, Oct 27, 2022 at 9:47 AM Walaa Eldin Moustafa < > wa.moust...@gmail.com> wrote: > >> Thanks Ryan! To expand a bit more: >> >> For representation, I was thinking that geometry types could be expressed >> as complex types (e.g., POINTs as Structs), so they are compatible with all >> engines without having to introduce user-defined types in both Iceberg and >> compute engines. >> >> For the partitioning: >> (1) Custom partition functions could directly operate on complex types >> (e.g., structs representing POINTs). In this case the partitioning function >> is like: geometry_hash(strcut_col); or >> (2) Partitioning spec could be extended to allow "generated columns" to >> be sources of partition functions, so a "generated" WKB column can be the >> intermediate representation between complex geometry types and partition >> functions that accept primitive types. In this case, the partitioning >> function is like hashBytes(wlb(struct_col)). >> >> Thanks, >> Walaa. >> >> On Thu, Oct 27, 2022 at 8:46 AM Ryan Blue <b...@tabular.io> wrote: >> >>> Thomas, thanks for taking the time to put this together! >>> >>> I've always wanted geospatial support in the format, but thought that >>> it would be best to have an expert design and build it with us so we >>> don't get it wrong. >>> >>> I think Walaa is right about the approach. We want to use partition >>> transforms to do the heavy lifting of finding the right files for a >>> query. That means that we'd need some clear but generic definition of >>> geospatial objects in the data, along with more specific attributes. >>> At a high level, I think that's probably done by storing each object >>> using a standard envelope definition (bbox?) that we can use in >>> partition transforms, and then a WKB column for the actual object. >>> >>> What do you think? >>> >>> Ryan >>> >>> On Thu, Oct 27, 2022 at 4:03 AM Walaa Eldin Moustafa >>> <wa.moust...@gmail.com> wrote: >>> > >>> > Hi Thomas, >>> > >>> > It sounds what you are trying to achieve is to provide a custom >>> partition function? There is some discussion here >>> > https://github.com/apache/iceberg/issues/1482. I guess supporting >>> geometry through this framework makes more sense since it does not require >>> extending the Iceberg type system, yet general enough to support other >>> applications. >>> > >>> > Thanks, >>> > Walaa. >>> > >>> > On Thu, Oct 27, 2022 at 12:33 AM Thomas Fredriksen >>> <thomas.fredriksen@oceandata.earth> wrote: >>> >> >>> >> Hello everyone, >>> >> >>> >> I am working big geospatial and trying to solve very large tables in >>> object storage. Iceberg appear to be the ideal solution but does >>> unfortunately not appear to support geometry columns. >>> >> >>> >> The way that iceberg is structured, it appears to be a good fit with >>> the GeoParquet-standard ( >>> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md), >>> so I created a pull request where I attempt to add this support: >>> https://github.com/apache/iceberg/pull/6062 >>> >> >>> >> The PR deviates from GeoParquet in the CRS-field of the column >>> metadata. GeoParquet requires the CRS to be defined as a PROJJSON JSON >>> object, while the PR simply asks the user to specify and EPSG ID, where >>> EPSG:4326 (WGS84 - latitude/longitude) is considered default. >>> >> >>> >> I would love feedback on the PR and welcome the discussion on whether >>> geospatial/geometry belongs in the iceberg standard. >>> >> >>> >> Thomas Li Fredriksen >>> >> Lead Solution Architect >>> >> >>> >> p +47 452 21 055 >>> >> >>> >> >>> >> ––––– >>> >> >>> >> www.hubocean.earth >>> >>> >>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >> > > -- > Ryan Blue > Tabular >