[ 
https://issues.apache.org/jira/browse/ARROW-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789952#comment-16789952
 ] 

yu peng edited comment on ARROW-4350 at 3/11/19 8:50 PM:
---------------------------------------------------------

{code:java}
In [1]: import numpy as np 
In [2]: import pyarrow as pa 
In [3]: arr = np.empty(2, dtype=object) 
In [4]: arr[0] = np.array([1, 2]) 
In [5]: arr[1] = np.array([2, 3]) 
In [6]: pa.array(arr).to_numpy() 
--------------------------------------------------------------------------- 
NotImplementedError Traceback (most recent call last) 
<ipython-input-6-4940e4471348> in <module>() 
----> 1 pa.array(arr).to_numpy() 
/Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/array.pxi
 in pyarrow.lib.Array.to_numpy() 
694 'for arrays without nulls.') 
695 if not is_primitive(self.type.id) or self.type.id == _Type_BOOL: 
--> 696 raise NotImplementedError('NumPy array view is only supported ' 
697 'for primitive types.') 
698 buflist = self.buffers() 
NotImplementedError: NumPy array view is only supported for primitive types.
{code}
I'm not sure whether we want to support `dtype=np.object`, since we can't even 
convert them back to numpy array


was (Author: yupbank):
```

In [1]: import numpy as np

In [2]: import pyarrow as pa

In [3]: arr = np.empty(2, dtype=object)

In [4]: arr[0] = np.array([1, 2])

In [5]: arr[1] = np.array([2, 3])

In [6]: pa.array(arr).to_numpy()
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-6-4940e4471348> in <module>()
----> 1 pa.array(arr).to_numpy()

/Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/array.pxi
 in pyarrow.lib.Array.to_numpy()
 694 'for arrays without nulls.')
 695 if not is_primitive(self.type.id) or self.type.id == _Type_BOOL:
--> 696 raise NotImplementedError('NumPy array view is only supported '
 697 'for primitive types.')
 698 buflist = self.buffers()

NotImplementedError: NumPy array view is only supported for primitive types.

```

I'm not sure whether we want to support `dtype=np.object`, since we can't even 
convert them back to numpy array

> [python] pyarrow table convert to pandas dataframe add extra information
> ------------------------------------------------------------------------
>
>                 Key: ARROW-4350
>                 URL: https://issues.apache.org/jira/browse/ARROW-4350
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.11.1, 0.12.0
>            Reporter: yu peng
>            Priority: Major
>             Fix For: 0.13.0
>
>
> {code:java}
> In [19]: df = pd.DataFrame({'a': [[[1], [2]], [[2], [3]]], 'b': [1, 2]})
> In [20]: df.iloc[0].to_dict()
> Out[20]: {'a': [[1], [2]], 'b': 1}
> In [21]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()
> Out[21]: {'a': array([array([1]), array([2])], dtype=object), 'b': 1}
> In [24]: np.array(df.iloc[0].to_dict()['a']).shape
> Out[24]: (2, 1)
> In [25]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()['a'].shape
> Out[25]: (2,)
> {code}
> Adding extra array type is not functioning as expected. 
>  
> More importantly, this would fail
>  
> {code:java}
> In [108]: df = pd.DataFrame({'a': [[[1, 2],[2, 3]], [[1,2], [2, 3]]], 'b': 
> [[1, 2],[2, 3]]})
> In [109]: df
> Out[109]:
> a b
> 0 [[1, 2], [2, 3]] [1, 2]
> 1 [[1, 2], [2, 3]] [2, 3]
> In [110]: pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas())
> ---------------------------------------------------------------------------
> ArrowTypeError Traceback (most recent call last)
> <ipython-input-110-4a09836f807e> in <module>()
> ----> 1 pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas())
> /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/table.pxi
>  in pyarrow.lib.Table.from_pandas()
> 1215 <pyarrow.lib.Table object at 0x7f05d1fb1b40>
> 1216 """
> -> 1217 names, arrays, metadata = pdcompat.dataframe_to_arrays(
> 1218 df,
> 1219 schema=schema,
> /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc
>  in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
> 379 arrays = [convert_column(c, t)
> 380 for c, t in zip(columns_to_convert,
> --> 381 convert_types)]
> 382 else:
> 383 from concurrent import futures
> /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc
>  in convert_column(col, ty)
> 374 e.args += ("Conversion failed for column {0!s} with type {1!s}"
> 375 .format(col.name, col.dtype),)
> --> 376 raise e
> 377
> 378 if nthreads == 1:
> ArrowTypeError: ('only size-1 arrays can be converted to Python scalars', 
> 'Conversion failed for column a with type object')
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to