vibhatha commented on PR #12590:
URL: https://github.com/apache/arrow/pull/12590#issuecomment-1090196626

   >  * I get a nice error message if the custom function does not return a 
correct object (eg a numpy array instead of pyarrow array), but if it returns 
an array of the wrong data type (not matching what I specified when 
registering), I get a segfault:
   > 
   > ```
   > In [2]: def add_one(array):
   >    ...:     return pa.array(array.to_pandas() + 1.0)
   > 
   > In [3]: pc.register_function("add_one", 1, {'summary': 'blabla', 
'description': '..', 'arg_names': ["arr"]}, [pc.InputType.array(pa.int64())], 
pa.int64(), add_one)
   > 
   > In [4]: pc.call_function("add_one", [pa.array([1, 2, 3])])
   > ../src/arrow/compute/function.cc:258:  Check failed: _s.ok() Operation 
failed: executor->CheckResultType(out, name_.c_str())
   > Bad status: Type error: kernel type result mismatch for function 
'add_one': declared as int64, actual is double
   > 
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.800(+0x12bec1a)[0x7f56df46ac1a]
   > 
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.800(+0x12beb98)[0x7f56df46ab98]
   > 
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.800(+0x12bebba)[0x7f56df46abba]
   > 
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.800(_ZN5arrow4util8ArrowLogD1Ev+0x47)[0x7f56df46af19]
   > 
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.800(_ZNK5arrow7compute8Function7ExecuteERKSt6vectorINS_5DatumESaIS3_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0xe54)[0x7f56df887632]
   > 
/home/joris/scipy/repos/arrow/python/pyarrow/_compute.cpython-38-x86_64-linux-gnu.so(+0x8073f)[0x7f56db36273f]
   > ...
   > Aborted (core dumped)
   > ```
   > 
   > (when fixing this, it would also be good to add a test case for it)
   
   To resolve this, I think what happens is when calling `call_function` in 
Python, it executes this particular component,
   
   
https://github.com/apache/arrow/blob/cf53e3cdcb4b2d68bc95f85b9ebcc0a212eb6ed1/cpp/src/arrow/compute/function.cc#L258
   
   I think this is not handled properly in Python. It seems like this behavior 
is exposed by what we are trying to do 
   with the UDFs. What do you think?
   
   cc @jorisvandenbossche 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to