westonpace commented on PR #35513:
URL: https://github.com/apache/arrow/pull/35513#issuecomment-1544181944
ibis-substrait and arrow have to agree on function names (and by name I mean
both the URI and the name).
substrait-io/substrait is the main repository for substrait and defines a
standard set of functions that are likely to be interesting to all producers
and consumers. I do think first/last belong in that list, however, there may
be some discussion to figure out exactly how to classify them (no existing
aggregate functions today depend on order so are these aggregate functions or
some kind of special new function).
So far, to my understanding, functions have fallen into two categories:
* Well defined functions
These functions are defined in substrait-io/substrait. They have
"official" names that are part of the Substrait spec and should be long
standing. In this case, agreement is easy, official names should always be
preferred. To use an official name it is important that Arrow has a mapping
that maps the official name to the Arrow function name.
* Arrow-specific functions
These functions are available in Arrow, but not defined yet in
substrait-io/substrait. Since there is no official name we instead use a
special URI `urn:arrow:substrait_simple_extension_function` with the Arrow
function name. One current limitation of Arrow-specific functions is that we
cannot specify options. However, one benefit is that no mapping is required.
The function name in the Substrait plan should always match the Arrow function
name exactly.
I'm confused by this PR. It is adding an Arrow mapping, but there is no
official function (yet). It will "work" if both ibis-substrait & arrow act
like it's already in Substrait but I feel this is going to be confusing to
people. Anyone looking at the Arrow repo, for example, might think that there
really is an official first/last function. I think it's best to use the
arrow-specific URI until the official function is adopted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]