[GitHub] [arrow] vibhatha commented on pull request #14320: ARROW-16212: [C++][Python] Register Multiple Kernels for a UDF

GitBox Mon, 02 Jan 2023 19:46:12 -0800


vibhatha commented on PR #14320:
URL: https://github.com/apache/arrow/pull/14320#issuecomment-1369365871


   > A few nits.
   > 
   > My main concern is that, rather than have `input_types` be a dictionary, I 
think it would be best to have the argument names as a separate field. 
Otherwise it just forces the user to grapple with questions like "why is it a 
dictionary? What am I supposed to use for the key? Can different rows have 
different keys?" when that could be more intuitive.
   > 
   > ```
   > def test_multi_kernel_registration():
   >     def unary_function(ctx, x):
   >         return pc.cast(pc.call_function("multiply", [x, 2],
   >                                         memory_pool=ctx.memory_pool), 
x.type)
   >     func_name = "y=x*2"
   >     unary_doc = {"summary": "multiply by two function",
   >                  "description": "test multiply function"}
   >     input_types = [
   >         pa.int8(),
   >         pa.int16(),
   >         pa.int32(),
   >         pa.int64(),
   >         pa.float32(),
   >         pa.float64()
   >     ]
   > 
   >     input_names = ["array"]
   > 
   >     output_types = [
   >         pa.int8(),
   >         pa.int16(),
   >         pa.int32(),
   >         pa.int64(),
   >         pa.float32(),
   >         pa.float64()
   >     ]
   >     pc.register_scalar_function(unary_function,
   >                                 func_name,
   >                                 unary_doc,
   >                                 input_types,
   >                                 input_names,
   >                                 output_types)
   > ```
   
   I see your point, but I have a counter argument for this (just my opinion).
   
   If we keep a separate list, it also makes it unclear which name goes for 
which type. Having them together makes it readable. We had a case for 
[aggreagtes](https://github.com/apache/arrow/pull/13150), where having them 
separate made it confusing. I am not saying it is the exact same case, but 
close enough. I would rather propose, documenting what we do clearly with a few 
examples in the docstring.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] vibhatha commented on pull request #14320: ARROW-16212: [C++][Python] Register Multiple Kernels for a UDF

Reply via email to