timsaucer commented on PR #1347:
URL:
https://github.com/apache/datafusion-python/pull/1347#issuecomment-3849797441
Sorry it's taken me a while to get around to this PR. It feels like we are
doing two different things
1. telling users that they need to return pyarrow scalars as the return of
evaluate
2. detect when it is a list and then we convert it to a python value and
back into a pyarrow scalar
It feels like this isn't the best option. I think we want to avoid doing any
kind of `to_pylist()` calls.
I think a more general solution would be something like
1. Determine if they have passed in a pyarrow scalar value. If so, use it.
2. If they have not passed in a pyarrow scalar value, use
`py_obj_to_scalar_value` to convert to a scalar value
3. Update `py_obj_to_scalar_value` to detect pyarrow arrays and convert them
to `ScalarValue::List`
For the last part we could do something like
```
if let Ok(_) = obj.getattr(py, "__arrow_c_array__") {
let obj = obj.bind(py);
let array_data = ArrayData::from_pyarrow_bound(obj)?;
let array = Arc::new(GenericListArray::<i32>::try_from(array_data)?);
return Ok(ScalarValue::List(array))
}
```
Additionally, if we're going to go down this route I think we would want to
treat both the `state()` and `evaluate()` since both of them should be
returning scalars.
An advantage of the point described above is that I think it adds more
flexibility to the users because their python functions can just return python
integers and such without having to convert them to pyarrow scalars.
What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]