jmdeschenes edited a comment on pull request #10565:
URL: https://github.com/apache/arrow/pull/10565#issuecomment-999017847
Hello,
There is an issue with the approach:
on array.pxi
```cython
cdef class ExtensionArray(Array):
"""
Concrete class for Arrow extension arrays.
"""
@property
def storage(self):
cdef:
CExtensionArray* ext_array = <CExtensionArray*>(self.ap)
return pyarrow_wrap_array(ext_array.storage())
#
## LINES SKIPPED
#
def to_numpy(self, **kwargs):
"""
Convert extension array to a numpy ndarray.
See Also
--------
Array.to_numpy
"""
return self.storage.to_numpy(**kwargs)
```
on table.pxi
```cython
def to_numpy(self):
"""
Return a NumPy copy of this array (experimental).
Returns
-------
array : numpy.ndarray
"""
cdef:
PyObject* out
PandasOptions c_options
object values
if self.type.id == _Type_EXTENSION:
storage_array = chunked_array(
[chunk.storage for chunk in self.iterchunks()],
type=self.type.storage_type
```
Both of these "strip" the Extension type sent to the CPP code. As such, the
CPP code never knows that it is dealing with an extension.
If this is to be kept, fixed_size_list would need to convert into a proper
2D numpy array(That could have several benefits, it could be done only for
primitive types at the start)
@jorisvandenbossche Do you think that is something that could be acceptable?
Otherwise, letting the CPP code handle the extension type could be another
option.
@sjperkins Are you still working on this PR? Is there something I can help
you with?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]