[
https://issues.apache.org/jira/browse/ARROW-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joris Van den Bossche updated ARROW-5655:
-----------------------------------------
Fix Version/s: 1.0.0
> [Python] Table.from_pydict/from_arrays not using types in specified schema
> correctly
> -------------------------------------------------------------------------------------
>
> Key: ARROW-5655
> URL: https://issues.apache.org/jira/browse/ARROW-5655
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Joris Van den Bossche
> Priority: Major
> Fix For: 1.0.0
>
>
> Example with {{from_pydict}} (from
> https://github.com/apache/arrow/pull/4601#issuecomment-503676534):
> {code:python}
> In [15]: table = pa.Table.from_pydict(
> ...: {'a': [1, 2, 3], 'b': [3, 4, 5]},
> ...: schema=pa.schema([('a', pa.int64()), ('c', pa.int32())]))
> In [16]: table
> Out[16]:
> pyarrow.Table
> a: int64
> c: int32
> In [17]: table.to_pandas()
> Out[17]:
> a c
> 0 1 3
> 1 2 0
> 2 3 4
> {code}
> Note that the specified schema has 1) different column names and 2) has a
> non-default type (int32 vs int64) which leads to corrupted values.
> This is partly due to {{Table.from_pydict}} not using the type information in
> the schema to convert the dictionary items to pyarrow arrays. But then it is
> also {{Table.from_arrays}} that is not correctly casting the arrays to
> another dtype if the schema specifies as such.
> Additional question for {{Table.pydict}} is whether it actually should
> override the 'b' key from the dictionary as column 'c' as defined in the
> schema (this behaviour depends on the order of the dictionary, which is not
> guaranteed below python 3.6).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)