[ https://issues.apache.org/jira/browse/ARROW-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou updated ARROW-6158: ---------------------------------- Fix Version/s: 1.0.0 > [Python] possible to create StructArray with type that conflicts with child > array's types > ----------------------------------------------------------------------------------------- > > Key: ARROW-6158 > URL: https://issues.apache.org/jira/browse/ARROW-6158 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Joris Van den Bossche > Priority: Major > Fix For: 1.0.0 > > > Using the Python interface as example. This creates a {{StructArray}} where > the field types don't match the child array types: > {code} > a = pa.array([1, 2, 3], type=pa.int64()) > b = pa.array(['a', 'b', 'c'], type=pa.string()) > inconsistent_fields = [pa.field('a', pa.int32()), pa.field('b', pa.float64())] > a = pa.StructArray.from_arrays([a, b], fields=inconsistent_fields) > {code} > The above works fine. I didn't find anything that errors (eg conversion to > pandas, slicing), also validation passes, but the type actually has the > inconsistent child types: > {code} > In [2]: a > Out[2]: > <pyarrow.lib.StructArray object at 0x7f450af52eb8> > -- is_valid: all not null > -- child 0 type: int64 > [ > 1, > 2, > 3 > ] > -- child 1 type: string > [ > "a", > "b", > "c" > ] > In [3]: a.type > Out[3]: StructType(struct<a: int32, b: double>) > In [4]: a.to_pandas() > Out[4]: > array([{'a': 1, 'b': 'a'}, {'a': 2, 'b': 'b'}, {'a': 3, 'b': 'c'}], > dtype=object) > In [5]: a.validate() > {code} > Shouldn't this be disallowed somehow? (it could be checked in the Python > {{from_arrays}} method, but maybe also in {{StructArray::Make}} which already > checks for the number of fields vs arrays and a consistent array length). > Similarly to discussion in ARROW-6132, I would also expect that this the > {{ValidateArray}} catches this. -- This message was sent by Atlassian Jira (v8.3.4#803005)