jorisvandenbossche commented on issue #26199:
URL: https://github.com/apache/arrow/issues/26199#issuecomment-1481065636
I am not sure we can do anything about this, since this is an inherent
limitation of numpy not having missing value support for integers, and so that
we have to use float64 to represent those. The same happens for primitive,
non-nested arrays as well:
```python
data = [None, 9007199254740993]
arr = pa.array(data, type=pa.uint64())
ndarray = arr.to_numpy(zero_copy_only=False)
>>> arr
<pyarrow.lib.UInt64Array object at 0x7fa6efe29400>
[
null,
9007199254740993
]
>>> ndarray
array([ nan, 9.00719925e+15])
```
One difference, though, is that when trying to recreate, we raise an error
instead of silently roundtripping a different value:
```python
>>> restored = pa.array(ndarray, type=arr.type)
...
ArrowInvalid: Float value nan was truncated converting to uint64
```
That's related to the fact that we don't do safe casting when converting
nested data, which is discussed in
https://github.com/apache/arrow/issues/31857.
Will add this issue as an extra example there, and then we can close this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]