[jira] [Commented] (ARROW-7855) TypeError on mixed array values

Rob DiCiuccio (Jira) Fri, 14 Feb 2020 09:06:53 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037114#comment-17037114
 ]


Rob DiCiuccio commented on ARROW-7855:
--------------------------------------

[~wesm] We're ultimately creating a PyArrow Table from SQL query (DB-API) 
results. In some cases (with databases such as Presto, Postgres, etc.) there 
may be nested data structures such as the one above in a column. We're using 
PyArrow to infer column data types, as DB-API drivers vary greatly in the 
metadata that's returned, and numpy/pandas lack proper support for nullable 
data types. We're also using Arrow as a means to serialize data to disk and 
rehydrate to a new Table instance.

We've been working around some of the issues with nested data types by 
serializing to JSON strings, which is not ideal, but is a functional 
workaround. Here's some additional context on how we're handling this: 
https://github.com/apache/incubator-superset/pull/9139

Any suggestions on how to better handle nested data types of unknown shape is 
appreciated.

> TypeError on mixed array values
> -------------------------------
>
>                 Key: ARROW-7855
>                 URL: https://issues.apache.org/jira/browse/ARROW-7855
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.16.0, 0.15.1
>            Reporter: Rob DiCiuccio
>            Priority: Major
>
> The following data structure passed to `pa.array` raises a generic 
> `TypeError`:
> {code:java}
> import pyarrow as pa
> pa.array([{'TestKey': [123456, 'foo']}])
> {code}
> {code:java}
> Traceback (most recent call last):
>  File "pyarrow_list_test.py", line 30, in <module>
>  pa_array = pa.array([\{'TestKey': [123456, 'foo']}])
>  File "pyarrow/array.pxi", line 269, in pyarrow.lib.array
>  File "pyarrow/array.pxi", line 38, in pyarrow.lib._sequence_to_array
> TypeError: an integer is required (got type str)
> {code}
> I understand there may be a way to overcome this by setting the `type` value 
> as an argument to `pa.array`, but the use case here is storing results of a 
> SQL query where the structure/type of the column is unknown.
> If Arrow is ultimately unable to handle this data structure without a 
> predefined `type` passed to `pa.array`, can the exception at least us the 
> PyArrow namespace (e.g. `pa.lib.ArrowTypeError` or 
> `pa.lib.ArrowNotImplementedError).
> Any other workaround suggestions welcome.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7855) TypeError on mixed array values

Reply via email to