[
https://issues.apache.org/jira/browse/ARROW-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joris Van den Bossche updated ARROW-4350:
-----------------------------------------
Description:
Nested numpy arrays cannot be converted to a list-of-list type array:
{code:python}
arr = np.empty(2, dtype=object)
arr[:] = [np.array([1, 2]), np.array([2, 3])]
pa.array([arr, arr])
{code}
results in
{code}
ArrowTypeError: only size-1 arrays can be converted to Python scalars
{code}
Starting from lists of lists works fine:
{code:python}
lists = [[1, 2], [2, 3]]
pa.array([lists, lists]).type
{code}
{code:none}
ListType(list<item: list<item: int64>>)
{code}
Specifying the type explicitly as {{pa.array([arr, arr],
type=pa.list_(pa.list_(pa.int64())))}} does not help.
Due to this, a round-trip is not working, as the list of list type gives back
an array of arrays in python:
{code:python}
In [2]: lists = [[1, 2], [2, 3]]
...: a = pa.array([lists, lists])
In [3]: a.to_pandas()
Out[3]:
array([array([array([1, 2]), array([2, 3])], dtype=object),
array([array([1, 2]), array([2, 3])], dtype=object)], dtype=object)
In [4]: pa.array(a.to_pandas())
---------------------------------------------------------------------------
ArrowTypeError Traceback (most recent call last)
<ipython-input-4-9fee6dc9d0b8> in <module>
----> 1 pa.array(a.to_pandas())
~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._ndarray_to_array()
~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowTypeError: only size-1 arrays can be converted to Python scalars
{code}
----
Origingal report:
{code:java}
In [19]: df = pd.DataFrame({'a': [[[1], [2]], [[2], [3]]], 'b': [1, 2]})
In [20]: df.iloc[0].to_dict()
Out[20]: {'a': [[1], [2]], 'b': 1}
In [21]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()
Out[21]: {'a': array([array([1]), array([2])], dtype=object), 'b': 1}
In [24]: np.array(df.iloc[0].to_dict()['a']).shape
Out[24]: (2, 1)
In [25]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()['a'].shape
Out[25]: (2,)
{code}
Adding extra array type is not functioning as expected.
More importantly, this would fail
{code:java}
In [108]: df = pd.DataFrame({'a': [[[1, 2],[2, 3]], [[1,2], [2, 3]]], 'b': [[1,
2],[2, 3]]})
In [109]: df
Out[109]:
a b
0 [[1, 2], [2, 3]] [1, 2]
1 [[1, 2], [2, 3]] [2, 3]
In [110]: pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas())
---------------------------------------------------------------------------
ArrowTypeError Traceback (most recent call last)
<ipython-input-110-4a09836f807e> in <module>()
----> 1 pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas())
/Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/table.pxi
in pyarrow.lib.Table.from_pandas()
1215 <pyarrow.lib.Table object at 0x7f05d1fb1b40>
1216 """
-> 1217 names, arrays, metadata = pdcompat.dataframe_to_arrays(
1218 df,
1219 schema=schema,
/Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc
in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
379 arrays = [convert_column(c, t)
380 for c, t in zip(columns_to_convert,
--> 381 convert_types)]
382 else:
383 from concurrent import futures
/Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc
in convert_column(col, ty)
374 e.args += ("Conversion failed for column {0!s} with type {1!s}"
375 .format(col.name, col.dtype),)
--> 376 raise e
377
378 if nthreads == 1:
ArrowTypeError: ('only size-1 arrays can be converted to Python scalars',
'Conversion failed for column a with type object')
{code}
was:
{code:java}
In [19]: df = pd.DataFrame({'a': [[[1], [2]], [[2], [3]]], 'b': [1, 2]})
In [20]: df.iloc[0].to_dict()
Out[20]: {'a': [[1], [2]], 'b': 1}
In [21]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()
Out[21]: {'a': array([array([1]), array([2])], dtype=object), 'b': 1}
In [24]: np.array(df.iloc[0].to_dict()['a']).shape
Out[24]: (2, 1)
In [25]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()['a'].shape
Out[25]: (2,)
{code}
Adding extra array type is not functioning as expected.
More importantly, this would fail
{code:java}
In [108]: df = pd.DataFrame({'a': [[[1, 2],[2, 3]], [[1,2], [2, 3]]], 'b': [[1,
2],[2, 3]]})
In [109]: df
Out[109]:
a b
0 [[1, 2], [2, 3]] [1, 2]
1 [[1, 2], [2, 3]] [2, 3]
In [110]: pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas())
---------------------------------------------------------------------------
ArrowTypeError Traceback (most recent call last)
<ipython-input-110-4a09836f807e> in <module>()
----> 1 pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas())
/Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/table.pxi
in pyarrow.lib.Table.from_pandas()
1215 <pyarrow.lib.Table object at 0x7f05d1fb1b40>
1216 """
-> 1217 names, arrays, metadata = pdcompat.dataframe_to_arrays(
1218 df,
1219 schema=schema,
/Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc
in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
379 arrays = [convert_column(c, t)
380 for c, t in zip(columns_to_convert,
--> 381 convert_types)]
382 else:
383 from concurrent import futures
/Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc
in convert_column(col, ty)
374 e.args += ("Conversion failed for column {0!s} with type {1!s}"
375 .format(col.name, col.dtype),)
--> 376 raise e
377
378 if nthreads == 1:
ArrowTypeError: ('only size-1 arrays can be converted to Python scalars',
'Conversion failed for column a with type object')
{code}
> [Python] nested numpy arrays
> ----------------------------
>
> Key: ARROW-4350
> URL: https://issues.apache.org/jira/browse/ARROW-4350
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.11.1, 0.12.0
> Reporter: yu peng
> Priority: Major
> Fix For: 0.14.0
>
>
> Nested numpy arrays cannot be converted to a list-of-list type array:
> {code:python}
> arr = np.empty(2, dtype=object)
> arr[:] = [np.array([1, 2]), np.array([2, 3])]
> pa.array([arr, arr])
> {code}
> results in
> {code}
> ArrowTypeError: only size-1 arrays can be converted to Python scalars
> {code}
> Starting from lists of lists works fine:
> {code:python}
> lists = [[1, 2], [2, 3]]
> pa.array([lists, lists]).type
> {code}
> {code:none}
> ListType(list<item: list<item: int64>>)
> {code}
> Specifying the type explicitly as {{pa.array([arr, arr],
> type=pa.list_(pa.list_(pa.int64())))}} does not help.
> Due to this, a round-trip is not working, as the list of list type gives back
> an array of arrays in python:
> {code:python}
> In [2]: lists = [[1, 2], [2, 3]]
> ...: a = pa.array([lists, lists])
>
>
> In [3]: a.to_pandas()
>
>
> Out[3]:
> array([array([array([1, 2]), array([2, 3])], dtype=object),
> array([array([1, 2]), array([2, 3])], dtype=object)], dtype=object)
> In [4]: pa.array(a.to_pandas())
>
>
> ---------------------------------------------------------------------------
> ArrowTypeError Traceback (most recent call last)
> <ipython-input-4-9fee6dc9d0b8> in <module>
> ----> 1 pa.array(a.to_pandas())
> ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
> ~/scipy/repos/arrow/python/pyarrow/array.pxi in
> pyarrow.lib._ndarray_to_array()
> ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowTypeError: only size-1 arrays can be converted to Python scalars
> {code}
> ----
> Origingal report:
> {code:java}
> In [19]: df = pd.DataFrame({'a': [[[1], [2]], [[2], [3]]], 'b': [1, 2]})
> In [20]: df.iloc[0].to_dict()
> Out[20]: {'a': [[1], [2]], 'b': 1}
> In [21]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()
> Out[21]: {'a': array([array([1]), array([2])], dtype=object), 'b': 1}
> In [24]: np.array(df.iloc[0].to_dict()['a']).shape
> Out[24]: (2, 1)
> In [25]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()['a'].shape
> Out[25]: (2,)
> {code}
> Adding extra array type is not functioning as expected.
>
> More importantly, this would fail
>
> {code:java}
> In [108]: df = pd.DataFrame({'a': [[[1, 2],[2, 3]], [[1,2], [2, 3]]], 'b':
> [[1, 2],[2, 3]]})
> In [109]: df
> Out[109]:
> a b
> 0 [[1, 2], [2, 3]] [1, 2]
> 1 [[1, 2], [2, 3]] [2, 3]
> In [110]: pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas())
> ---------------------------------------------------------------------------
> ArrowTypeError Traceback (most recent call last)
> <ipython-input-110-4a09836f807e> in <module>()
> ----> 1 pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas())
> /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/table.pxi
> in pyarrow.lib.Table.from_pandas()
> 1215 <pyarrow.lib.Table object at 0x7f05d1fb1b40>
> 1216 """
> -> 1217 names, arrays, metadata = pdcompat.dataframe_to_arrays(
> 1218 df,
> 1219 schema=schema,
> /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc
> in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
> 379 arrays = [convert_column(c, t)
> 380 for c, t in zip(columns_to_convert,
> --> 381 convert_types)]
> 382 else:
> 383 from concurrent import futures
> /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc
> in convert_column(col, ty)
> 374 e.args += ("Conversion failed for column {0!s} with type {1!s}"
> 375 .format(col.name, col.dtype),)
> --> 376 raise e
> 377
> 378 if nthreads == 1:
> ArrowTypeError: ('only size-1 arrays can be converted to Python scalars',
> 'Conversion failed for column a with type object')
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)