[ https://issues.apache.org/jira/browse/ARROW-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942841#comment-16942841 ]
Joris Van den Bossche commented on ARROW-5655: ---------------------------------------------- [~kszucs] I think this might already be fixed in the mean-time. Wes and I did some work related to schema handling the last month > [Python] Table.from_pydict/from_arrays not using types in specified schema > correctly > ------------------------------------------------------------------------------------- > > Key: ARROW-5655 > URL: https://issues.apache.org/jira/browse/ARROW-5655 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Joris Van den Bossche > Assignee: Krisztian Szucs > Priority: Major > Fix For: 1.0.0 > > > Example with {{from_pydict}} (from > https://github.com/apache/arrow/pull/4601#issuecomment-503676534): > {code:python} > In [15]: table = pa.Table.from_pydict( > ...: {'a': [1, 2, 3], 'b': [3, 4, 5]}, > ...: schema=pa.schema([('a', pa.int64()), ('c', pa.int32())])) > In [16]: table > Out[16]: > pyarrow.Table > a: int64 > c: int32 > In [17]: table.to_pandas() > Out[17]: > a c > 0 1 3 > 1 2 0 > 2 3 4 > {code} > Note that the specified schema has 1) different column names and 2) has a > non-default type (int32 vs int64) which leads to corrupted values. > This is partly due to {{Table.from_pydict}} not using the type information in > the schema to convert the dictionary items to pyarrow arrays. But then it is > also {{Table.from_arrays}} that is not correctly casting the arrays to > another dtype if the schema specifies as such. > Additional question for {{Table.pydict}} is whether it actually should > override the 'b' key from the dictionary as column 'c' as defined in the > schema (this behaviour depends on the order of the dictionary, which is not > guaranteed below python 3.6). -- This message was sent by Atlassian Jira (v8.3.4#803005)