[
https://issues.apache.org/jira/browse/ARROW-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835730#comment-16835730
]
Joris Van den Bossche commented on ARROW-5287:
----------------------------------------------
Yes, I understand the "ambiguous" reason, but on the other hand, StructArray is
not really an option as default since for that the struct names need to be
known.
Doing it automatically would allow to save such dataframes to Parquet out of
the box (from ARROW-4814), but of course, you can always specify the schema
manually.
In general, it would be nice to have an error message that points people
towards specifying a list or struct type if you have tuples as data. But I
assume this is not that easy, as the error message looks like a generic one
where the value and type is filled in.
> [Python] automatic type inference for arrays of tuples
> ------------------------------------------------------
>
> Key: ARROW-5287
> URL: https://issues.apache.org/jira/browse/ARROW-5287
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Priority: Major
>
> Arrays of tuples are support to be converted to either ListArray or
> StructArray, if you specify the type explicitly:
> {code}
> In [6]: pa.array([(1, 2), (3, 4, 5)], type=pa.list_(pa.int64()))
> Out[6]:
> <pyarrow.lib.ListArray object at 0x7f1b01a4d408>
> [
> [
> 1,
> 2
> ],
> [
> 3,
> 4,
> 5
> ]
> ]
> In [7]: pa.array([(1, 2), (3, 4)], type=pa.struct([('a', pa.int64()), ('b',
> pa.int64())]))
> Out[7]:
> <pyarrow.lib.StructArray object at 0x7f1b01a51b88>
> -- is_valid: all not null
> -- child 0 type: int64
> [
> 1,
> 3
> ]
> -- child 1 type: int64
> [
> 2,
> 4
> ]
> {code}
> But not when no type is specified:
> {code}
> In [8]: pa.array([(1, 2), (3, 4)])
>
>
> ---------------------------------------------------------------------------
> ArrowInvalid Traceback (most recent call last)
> <ipython-input-8-ab2d80c7486d> in <module>
> ----> 1 pa.array([(1, 2), (3, 4)])
> ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
> ~/scipy/repos/arrow/python/pyarrow/array.pxi in
> pyarrow.lib._sequence_to_array()
> ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Could not convert (1, 2) with type tuple: did not recognize
> Python value type when inferring an Arrow data type
> {code}
> Do we want to do automatic type inference for tuples as well? (defaulting to
> the ListArray case, just as arrays of python lists are supported)
> Or was there a specific reason to not support this by default?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)