vibhatha commented on PR #14320:
URL: https://github.com/apache/arrow/pull/14320#issuecomment-1369365871
> A few nits.
>
> My main concern is that, rather than have `input_types` be a dictionary, I
think it would be best to have the argument names as a separate field.
Otherwise it just forces the user to grapple with questions like "why is it a
dictionary? What am I supposed to use for the key? Can different rows have
different keys?" when that could be more intuitive.
>
> ```
> def test_multi_kernel_registration():
> def unary_function(ctx, x):
> return pc.cast(pc.call_function("multiply", [x, 2],
> memory_pool=ctx.memory_pool),
x.type)
> func_name = "y=x*2"
> unary_doc = {"summary": "multiply by two function",
> "description": "test multiply function"}
> input_types = [
> pa.int8(),
> pa.int16(),
> pa.int32(),
> pa.int64(),
> pa.float32(),
> pa.float64()
> ]
>
> input_names = ["array"]
>
> output_types = [
> pa.int8(),
> pa.int16(),
> pa.int32(),
> pa.int64(),
> pa.float32(),
> pa.float64()
> ]
> pc.register_scalar_function(unary_function,
> func_name,
> unary_doc,
> input_types,
> input_names,
> output_types)
> ```
I see your point, but I have a counter argument for this (just my opinion).
If we keep a separate list, it also makes it unclear which name goes for
which type. Having them together makes it readable. We had a case for
[aggreagtes](https://github.com/apache/arrow/pull/13150), where having them
separate made it confusing. I am not saying it is the exact same case, but
close enough. I would rather propose, documenting what we do clearly with a few
examples in the docstring.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]