Joris Van den Bossche created ARROW-5286:
--------------------------------------------
Summary: [Python] support Structs in Table.from_pandas given a
known schema
Key: ARROW-5286
URL: https://issues.apache.org/jira/browse/ARROW-5286
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Joris Van den Bossche
ARROW-2073 implemented creating a StructArray from an array of tuples (in
addition to from dicts).
This works in {{pyarrow.array}} (specifying the proper type):
{code}
In [2]: df = pd.DataFrame({'tuples': [(1, 2), (3, 4)]})
In [3]: struct_type = pa.struct([('a', pa.int64()), ('b', pa.int64())])
In [4]: pa.array(df['tuples'], type=struct_type)
Out[4]:
<pyarrow.lib.StructArray object at 0x7f1b02ff6818>
-- is_valid: all not null
-- child 0 type: int64
[
1,
3
]
-- child 1 type: int64
[
2,
4
]
{code}
But does not yet work when converting a DataFrame to Table while specifying the
type in a schema:
{code}
In [5]: pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)]))
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
get_logical_type(arrow_type)
68 try:
---> 69 return logical_type_map[arrow_type.id]
70 except KeyError:
KeyError: 24
During handling of the above exception, another exception occurred:
NotImplementedError Traceback (most recent call last)
<ipython-input-5-c18748f9b954> in <module>
----> 1 pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)]))
~/scipy/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.Table.from_pandas()
~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in dataframe_to_arrays(df,
schema, preserve_index, nthreads, columns, safe)
483 metadata = construct_metadata(df, column_names, index_columns,
484 index_descriptors, preserve_index,
--> 485 types)
486 return all_names, arrays, metadata
487
~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in construct_metadata(df,
column_names, index_levels, index_descriptors, preserve_index, types)
207 metadata = get_column_metadata(df[col_name],
name=sanitized_name,
208 arrow_type=arrow_type,
--> 209 field_name=sanitized_name)
210 column_metadata.append(metadata)
211
~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
get_column_metadata(column, name, arrow_type, field_name)
149 dict
150 """
--> 151 logical_type = get_logical_type(arrow_type)
152
153 string_dtype, extra_metadata = get_extension_dtype_info(column)
~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
get_logical_type(arrow_type)
77 elif isinstance(arrow_type, pa.lib.Decimal128Type):
78 return 'decimal'
---> 79 raise NotImplementedError(str(arrow_type))
80
81
NotImplementedError: struct<a: int64, b: int64>
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)