> On Mar 6, 2023, at 3:37 PM, Bertil Chapuis <[email protected]> wrote:
>
> Thank you for your answers and for the pointers.
>
>> PS Regarding which specification we choose to implement. The four principles
>> you outline sound good to me. It’s always better to follow the standard. If
>> leading implementations (e.g. PostGIS and H2GIS) diverge from the standard,
>> we can make a note, and possibly support them as secondary implementations.
>>
>> Sometimes there are compatible extensions. E.g. the OpenGIS function has two
>> arguments, and the PostGIS version accepts two or three arguments. If
>> someone calls Calcite's implementation of the function with three arguments,
>> it is reasonable that they get the PostGIS behavior.
>
> My feeling is that spatial is a rather small niche and that developers do not
> necessarily seek compatibility with Postgis or H2GIS. Personally, I almost
> always have to refer to the documentation of the RDBMS when writing queries
> that use spatial functions and I have few expectations regarding their
> behaviour beyond what is stated in the documentation. In this regard, I feel
> that a good standalone documentation is preferable to a documentation with
> references to other implementations that may differ in subtle ways.
It is a truism that the majority of SQL is generated by programs, not people.
Those programs have already been written for industry-standard dialects (e.g.
Oracle, Postgres) and therefore the best semantics is to be consistent with
those standards.
For geospatial SQL, I suspect that a somewhat higher percentage of queries are
written by people. Still, the principle of least surprise dictates that we
should comply with the de facto standard unless there are compelling reasons
not to.
If we depart from the standard for particular functions, let’s do it
consciously, explicitly, and rarely.
Calcite does not currently have detailed documentation. We have benefited
hugely from products that do, such as Postgres and H2GIS, because our users can
safely assume that our semantics is consistent with the other products.
Documenting our functions exhaustively - say, moving from one page for all
functions to one page per function - would be a huge endeavor and I can’t see
it happening with the current community. The best we could reasonably achieve
would be separate pages for a very small number of exceptional functions.
>
>>> Similar issues have come up with non-GIS functions. For example, the
>>> DATEDIFF function [1]. Snowflake and MSSQL have ‘DATEDIFF(timeUnit,
>>> datetime, datetime2)’, whereas MySQL has ‘DATEDIFF(date, date2)’. We
>>> document which specification we implement, and potentially we could
>>> implement both specifications, as either functions with different names, or
>>> by selectively enabling the behavior that the user wants.
>>>
>>> The most important thing is to document which specification we are
>>> implementing (when there is ambiguity). Then if someone would prefer the
>>> other specification, they can contribute an implementation, and we can
>>> selectively enable it.
>
> Do you think we should eventually add a @LibraryOperator for spatial type
> functions (i.e. OpenGIS, Postgis, H2GIS)? There is still a TODO (prior to the
> changes I made) in the SpatialTypeFunctions class asking the following
> question: Should we create aliases for functions in upper-case [1]? My
> current understanding of this question is that a possibility may be to have a
> method named “buffer” in the class and an alias named “ST_Buffer” registered
> somewhere else. Appart from resolving lots of sonar code smells related to
> method names, this approach would allow to more easily introduce new methods
> that don’t have ST_ counterparts in other RDBMS.
It would be good to move all spatial functions out of the “core” module at some
point. If we did that, it would probably make sense to declare the geospatial
functions in a class with similar structure to SqlLibraryOperators so that they
can be selectively enabled.
At this point I don’t see much benefit for adding separate libraries for
OpenGIS, Postgis, H2GIS. But that might change. As far as I can tell (correct
me if I am wrong) they are mostly the same functions, with slightly different
specifications. If so, a ‘compliance’ flag might be more appropriate.
I wouldn’t worry about code smells. Something done for a good reason isn’t a
code smell. It is a false negative from a robot.
> After merging CALCITE-5367 [2], I think we will have a relatively decent
> support for spatial types in core. At this stage, I think it would be good to
> see if the organisation of the code can be improved before investigating
> support for more advanced spatial functions (e.g. 3D, ST_AsMVT,
> ST_AsFlatGeobuf, etc.).
Sounds good. Identify problems and propose solutions. Algebraic approaches -
e.g. decomposing functions into smaller composable units, like how AVG
decomposes into SUM and COUNT - are well worth considering.
Julian