[ 
https://issues.apache.org/jira/browse/ARROW-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529521#comment-16529521
 ] 

Uwe L. Korn commented on ARROW-2774:
------------------------------------

This is definitely something users would like to see but I would also like to 
see this hidden behind a flag. Being able to deal with unsanitized input is 
often a typical {{pandas}} use case in exploratory data analysis but once you 
use this as part of a production pipeline, you rather want to have it error.

> [Python] Generate Unions when inferring types
> ---------------------------------------------
>
>                 Key: ARROW-2774
>                 URL: https://issues.apache.org/jira/browse/ARROW-2774
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Wes McKinney
>            Priority: Major
>
> It would be useful to be able to generate unions during type inference:
> {code}
> In [11]: pa.array([{'a': 1, 'b': 'string'}, {'b': 2}])
> ---------------------------------------------------------------------------
> ArrowTypeError                            Traceback (most recent call last)
> <ipython-input-11-c554b698271b> in <module>()
> ----> 1 pa.array([{'a': 1, 'b': 'string'}, {'b': 2}])
> ~/code/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
>     179         if mask is not None:
>     180             raise ValueError("Masks only supported with ndarray-like 
> inputs")
> --> 181         return _sequence_to_array(obj, size, type, pool)
>     182 
>     183 
> ~/code/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()
>      24         if size is None:
>      25             with nogil:
> ---> 26                 check_status(ConvertPySequence(sequence, pool, &out))
>      27         else:
>      28             c_size = size
> ~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
>      89             raise ArrowNotImplementedError(message)
>      90         elif status.IsTypeError():
> ---> 91             raise ArrowTypeError(message)
>      92         elif status.IsCapacityError():
>      93             raise ArrowCapacityError(message)
> ArrowTypeError: ../src/arrow/python/builtin_convert.cc:794 code: 
> AppendPySequence(seq, size, real_type, builder.get())
> ../src/arrow/python/iterators.h:60 code: func(value)
> ../src/arrow/python/builtin_convert.cc:619 code: 
> value_converters_[i]->AppendSingle(valueobj ? valueobj : Py_None)
> ../src/arrow/python/builtin_convert.cc:414 code: 
> internal::CIntFromPython(obj, &value)
> ../src/arrow/python/helpers.cc:259 code: CheckPyError()
> an integer is required (got type str)
> In [12]: pa.array([5, 'str', False])
> ---------------------------------------------------------------------------
> ArrowTypeError                            Traceback (most recent call last)
> <ipython-input-12-9e3343f08351> in <module>()
> ----> 1 pa.array([5, 'str', False])
> ~/code/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
>     179         if mask is not None:
>     180             raise ValueError("Masks only supported with ndarray-like 
> inputs")
> --> 181         return _sequence_to_array(obj, size, type, pool)
>     182 
>     183 
> ~/code/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()
>      24         if size is None:
>      25             with nogil:
> ---> 26                 check_status(ConvertPySequence(sequence, pool, &out))
>      27         else:
>      28             c_size = size
> ~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
>      89             raise ArrowNotImplementedError(message)
>      90         elif status.IsTypeError():
> ---> 91             raise ArrowTypeError(message)
>      92         elif status.IsCapacityError():
>      93             raise ArrowCapacityError(message)
> ArrowTypeError: ../src/arrow/python/builtin_convert.cc:794 code: 
> AppendPySequence(seq, size, real_type, builder.get())
> ../src/arrow/python/iterators.h:60 code: func(value)
> ../src/arrow/python/builtin_convert.cc:414 code: 
> internal::CIntFromPython(obj, &value)
> ../src/arrow/python/helpers.cc:259 code: CheckPyError()
> an integer is required (got type str)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to