[GitHub] [arrow] westonpace commented on issue #35843: [C++] Acero produces unexpected output type for add function on decimals

via GitHub Thu, 08 Jun 2023 16:03:10 -0700


westonpace commented on issue #35843:
URL: https://github.com/apache/arrow/issues/35843#issuecomment-1583567633


   Good catch.
   
   This promotion is done in two parts, a cast, and the function itself.  For 
your example we should first be implicitly casting both inputs to `decimal(10, 
2)` and then we should apply the function (which is where the precision gets 
increased).  So we have:
   
   ```
   decimal(10, 1) --> CAST decimal(10, 2)----ADD---decimal(12, 2)
   decimal(10, 2) -------> decimal(10, 2) --/
   ```
   
   When calling functions outside Acero (e.g. in pyarrow this would be what 
happens on `pyarrow.compute.add`) we call a method named `DispatchBest` on the 
function object to choose the most appropriate kernel with each invocation.  
The default behavior is to call `DispatchExact` and fail otherwise.  However, 
some functions (e.g. most arithmetic functions) will override `DispatchBest` 
and attempt to do some implicit casting.  This is where the cast is inserted.
   
   Inside Acero, we don't want to have to go through the logic to pick the 
correct kernel again and again on every single batch.  So, instead, we "bind 
the expression" to the input schema.  Part of this binding process is picking 
the correct function kernel to execute.  However, it seems the logic here is a 
little different:
   
   ```
     // First try and bind exactly
     Result<const Kernel*> maybe_exact_match = 
call.function->DispatchExact(types);
     if (maybe_exact_match.ok()) {
       call.kernel = *maybe_exact_match;
     } else {
       if (!insert_implicit_casts) {
         return maybe_exact_match.status();
       }
       // If exact binding fails, and we are allowed to cast, then prefer 
casting literals
       // first.  Since DispatchBest generally prefers up-casting the best way 
to do this is
       // first down-cast the literals as much as possible
       types = GetTypesWithSmallestLiteralRepresentation(call.arguments);
       ARROW_ASSIGN_OR_RAISE(call.kernel, call.function->DispatchBest(&types));
   ```
   
   In other words, we start with `DispatchExact` and only call `DispatchBest` 
if we can't find an exact match.
   
   The problem you are encountering is that `DispatchExact` is recording an 
exact match.  To understand why that happens we have to look at how we decide 
if kernels are a match.  To do that we register an input matcher for each of 
the arguments.  The input matcher might be "This only matches if the type is 
exactly X" but that doesn't work so well for parameterized types (we'd have to 
register a kernel for every single instance of decimal128, etc.)
   
   So, for parameterized types, we typically register a "type matcher" of some 
kind.  It appears that the type matcher for decimal arithmetic is "this input 
matches if the type is a decimal type, regardless of precision or scale".  So 
that's how we end up thinking this is an exact match.
   
   A workaround for now would be to do the upcasting explicitly, yourself, but 
that certainly isn't convenient.
   
   Probably, what would be ideal, is if the type matcher was "match if this is 
a decimal type and the other type is exactly the same" but the type matcher 
interface does not currently see all of the types (just the one type it is 
considering).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on issue #35843: [C++] Acero produces unexpected output type for add function on decimals

Reply via email to