[
https://issues.apache.org/jira/browse/ARROW-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662308#comment-17662308
]
Rok Mihevc commented on ARROW-5286:
-----------------------------------
This issue has been migrated to [issue
#21754|https://github.com/apache/arrow/issues/21754] on GitHub. Please see the
[migration documentation|https://github.com/apache/arrow/issues/14542] for
further details.
> [Python] support Structs in Table.from_pandas given a known schema
> ------------------------------------------------------------------
>
> Key: ARROW-5286
> URL: https://issues.apache.org/jira/browse/ARROW-5286
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Assignee: Joris Van den Bossche
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.14.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> ARROW-2073 implemented creating a StructArray from an array of tuples (in
> addition to from dicts).
> This works in {{pyarrow.array}} (specifying the proper type):
> {code}
> In [2]: df = pd.DataFrame({'tuples': [(1, 2), (3, 4)]})
>
>
> In [3]: struct_type = pa.struct([('a', pa.int64()), ('b', pa.int64())])
>
>
> In [4]: pa.array(df['tuples'], type=struct_type)
>
>
> Out[4]:
> <pyarrow.lib.StructArray object at 0x7f1b02ff6818>
> -- is_valid: all not null
> -- child 0 type: int64
> [
> 1,
> 3
> ]
> -- child 1 type: int64
> [
> 2,
> 4
> ]
> {code}
> But does not yet work when converting a DataFrame to Table while specifying
> the type in a schema:
> {code}
> In [5]: pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)]))
>
>
> ---------------------------------------------------------------------------
> KeyError Traceback (most recent call last)
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
> get_logical_type(arrow_type)
> 68 try:
> ---> 69 return logical_type_map[arrow_type.id]
> 70 except KeyError:
> KeyError: 24
> During handling of the above exception, another exception occurred:
> NotImplementedError Traceback (most recent call last)
> <ipython-input-5-c18748f9b954> in <module>
> ----> 1 pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)]))
> ~/scipy/repos/arrow/python/pyarrow/table.pxi in
> pyarrow.lib.Table.from_pandas()
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
> dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
> 483 metadata = construct_metadata(df, column_names, index_columns,
> 484 index_descriptors, preserve_index,
> --> 485 types)
> 486 return all_names, arrays, metadata
> 487
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in construct_metadata(df,
> column_names, index_levels, index_descriptors, preserve_index, types)
> 207 metadata = get_column_metadata(df[col_name],
> name=sanitized_name,
> 208 arrow_type=arrow_type,
> --> 209 field_name=sanitized_name)
> 210 column_metadata.append(metadata)
> 211
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
> get_column_metadata(column, name, arrow_type, field_name)
> 149 dict
> 150 """
> --> 151 logical_type = get_logical_type(arrow_type)
> 152
> 153 string_dtype, extra_metadata = get_extension_dtype_info(column)
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
> get_logical_type(arrow_type)
> 77 elif isinstance(arrow_type, pa.lib.Decimal128Type):
> 78 return 'decimal'
> ---> 79 raise NotImplementedError(str(arrow_type))
> 80
> 81
> NotImplementedError: struct<a: int64, b: int64>
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)