[ 
https://issues.apache.org/jira/browse/ARROW-15582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530378#comment-17530378
 ] 

Yaron Gvili edited comment on ARROW-15582 at 5/1/22 7:07 AM:
-------------------------------------------------------------

This is an interesting discussion for me as I ran into this issue myself. In my 
specific use case, which only required a specific set of functions, I was able 
to manage by hard-coding a couple of simple Substrait-to-Arrow function-name 
mappings and by adding a [special case for 
cast|https://github.com/apache/arrow/pull/13032], which I'm not sure fits the 
above API - the return-type, which affects the cast operation, seems to be 
missing from the API. At least for me, it would be useful to get a short-term 
solution that provides (perhaps configurable) simple mappings.

In the context of the general discussion, Substrait also has a ternary-function 
"clip" that does not currently appear in the list. Some possible solutions for 
it are:
 # Map "clip(x, a, b)" to an Arrow expression like 
"min_element_wise(max_element_wise(x, a), b)". This solution would work with 
the above Substrait-to-Arrow API but would require some kind of 
expression-matching in the reverse direction.
 # Add An Arrow "clip" function. AFAIK, Arrow has good support for unary and 
binary scalar kernels but not for ternary ones, so this solution would require 
adding this support first.
 # Translate "clip(x, a, b)" to "clip_upper(clip_lower(x, a), b)" in Substrait 
and then add simple mappings from "clip_upper" and "clip_lower" in Substrait to 
"min_element_wise" and "max_element_wise" in Arrow, respectively. This solution 
has an impact on the Substrait DSL specification.


was (Author: JIRAUSER284707):
This is an interesting discussion for me as I ran into this issue myself. In my 
specific use case, which only required a specific set of functions, I was able 
to manage by hard-coding a couple of simple Substrait-to-Arrow function-name 
mappings and by adding a [special case for 
cast|[https://github.com/apache/arrow/pull/13032]|https://github.com/apache/arrow/pull/13032].],
 which I'm not sure fits the above API - the return-type, which affects the 
cast operation, seems to be missing from the API. At least for me, it would be 
useful to get a short-term solution that provides (perhaps configurable) simple 
mappings.

In the context of the general discussion, Substrait also has a ternary-function 
"clip" that does not currently appear in the list. Some possible solutions for 
it are:
 # Map "clip(x, a, b)" to an Arrow expression like 
"min_element_wise(max_element_wise(x, a), b)". This solution would work with 
the above Substrait-to-Arrow API but would require some kind of 
expression-matching in the reverse direction.
 # Add An Arrow "clip" function. AFAIK, Arrow has good support for unary and 
binary scalar kernels but not for ternary ones, so this solution would require 
adding this support first.
 # Translate "clip(x, a, b)" to "clip_upper(clip_lower(x, a), b)" in Substrait 
and then add simple mappings from "clip_upper" and "clip_lower" in Substrait to 
"min_element_wise" and "max_element_wise" in Arrow, respectively. This solution 
has an impact on the Substrait DSL specification.

> [C++] Add support for registering tricky functions with the Substrait 
> consumer (or add a bunch of substrait meta functions)
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-15582
>                 URL: https://issues.apache.org/jira/browse/ARROW-15582
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Sanjiban Sengupta
>            Priority: Major
>              Labels: substrait
>
> Sometimes one Substrait function will map to multiple Arrow functions.  For 
> example, the Substrait {{add}} function might be referring to Arrow's {{add}} 
> or {{add_checked}}.  We need to figure out how to register this correctly 
> (e.g. one possible approach would be a {{substrait_add}} meta function).
> Other times a substrait function will encode something Arrow considers an 
> "option" as a function argument.  For example, the is_in Arrow function is 
> unary with an option for the lookup set.  The substrait function is binary 
> but the second argument must be constant and be the lookup set.  Neither of 
> which is to be confused with a truly binary is_in function which takes in a 
> different set at every row.
> It's possible there is no work to do here other than adding a bunch of 
> substrait_ meta functions in Arrow.  In that case all the work will be done 
> in other JIRAs.  Or, it is possible that there is some kind of extension we 
> can make to the function registry that bypasses the need for the meta 
> functions.  I'm leaving this JIRA open so future contributors can consider 
> this second option.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to