[
https://issues.apache.org/jira/browse/ARROW-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838389#comment-16838389
]
Joris Van den Bossche commented on ARROW-5286:
----------------------------------------------
Actually, also converting from dicts (without the need to specify the schema)
shows the same limitation: it works in {{pa.array(..)}} but not in
{{pa.Table.from_pandas(..)}}:
{code:java}
In [14]: df = pd.DataFrame({'dicts': [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]})
In [15]: pa.array(df['dicts'])
Out[15]:
<pyarrow.lib.StructArray object at 0x7fb837d869a8>
-- is_valid: all not null
-- child 0 type: int64
[
1,
3
]
-- child 1 type: int64
[
2,
4
]
In [16]: pa.Table.from_pandas(df)
...
NotImplementedError: struct<a: int64, b: int64>{code}
> [Python] support Structs in Table.from_pandas given a known schema
> ------------------------------------------------------------------
>
> Key: ARROW-5286
> URL: https://issues.apache.org/jira/browse/ARROW-5286
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Assignee: Joris Van den Bossche
> Priority: Major
> Fix For: 0.14.0
>
>
> ARROW-2073 implemented creating a StructArray from an array of tuples (in
> addition to from dicts).
> This works in {{pyarrow.array}} (specifying the proper type):
> {code}
> In [2]: df = pd.DataFrame({'tuples': [(1, 2), (3, 4)]})
>
>
> In [3]: struct_type = pa.struct([('a', pa.int64()), ('b', pa.int64())])
>
>
> In [4]: pa.array(df['tuples'], type=struct_type)
>
>
> Out[4]:
> <pyarrow.lib.StructArray object at 0x7f1b02ff6818>
> -- is_valid: all not null
> -- child 0 type: int64
> [
> 1,
> 3
> ]
> -- child 1 type: int64
> [
> 2,
> 4
> ]
> {code}
> But does not yet work when converting a DataFrame to Table while specifying
> the type in a schema:
> {code}
> In [5]: pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)]))
>
>
> ---------------------------------------------------------------------------
> KeyError Traceback (most recent call last)
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
> get_logical_type(arrow_type)
> 68 try:
> ---> 69 return logical_type_map[arrow_type.id]
> 70 except KeyError:
> KeyError: 24
> During handling of the above exception, another exception occurred:
> NotImplementedError Traceback (most recent call last)
> <ipython-input-5-c18748f9b954> in <module>
> ----> 1 pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)]))
> ~/scipy/repos/arrow/python/pyarrow/table.pxi in
> pyarrow.lib.Table.from_pandas()
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
> dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
> 483 metadata = construct_metadata(df, column_names, index_columns,
> 484 index_descriptors, preserve_index,
> --> 485 types)
> 486 return all_names, arrays, metadata
> 487
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in construct_metadata(df,
> column_names, index_levels, index_descriptors, preserve_index, types)
> 207 metadata = get_column_metadata(df[col_name],
> name=sanitized_name,
> 208 arrow_type=arrow_type,
> --> 209 field_name=sanitized_name)
> 210 column_metadata.append(metadata)
> 211
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
> get_column_metadata(column, name, arrow_type, field_name)
> 149 dict
> 150 """
> --> 151 logical_type = get_logical_type(arrow_type)
> 152
> 153 string_dtype, extra_metadata = get_extension_dtype_info(column)
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
> get_logical_type(arrow_type)
> 77 elif isinstance(arrow_type, pa.lib.Decimal128Type):
> 78 return 'decimal'
> ---> 79 raise NotImplementedError(str(arrow_type))
> 80
> 81
> NotImplementedError: struct<a: int64, b: int64>
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)