[
https://issues.apache.org/jira/browse/ARROW-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705391#comment-16705391
]
Wes McKinney commented on ARROW-3890:
-------------------------------------
This is actually an issue converting NumPy binary arrays. Here is the trace
with {{-DARROW_EXTRA_ERROR_CONTEXT=on}}:
{code}
> raise ArrowInvalid(message)
E ArrowInvalid: ../src/arrow/python/numpy_to_arrow.cc:795 code:
converter.Convert()
E ../src/arrow/python/numpy_to_arrow.cc:660 code: AppendUTF32(data,
itemsize_, byteorder, &builder)
E ../src/arrow/python/numpy_to_arrow.cc:620 code: CheckPyError()
E 'utf32' codec can't decode bytes in position 0-3: code point not in
range(0x110000)
{code}
> [Python] Creating Array with explicit string type fails on Python 2.7
> ---------------------------------------------------------------------
>
> Key: ARROW-3890
> URL: https://issues.apache.org/jira/browse/ARROW-3890
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.11.1
> Reporter: jacques
> Assignee: Wes McKinney
> Priority: Major
> Labels: parquet
> Fix For: 0.12.0
>
>
> Pyarrow arrays of string cannot be created from numpy arrays of string
> anymore for versions pyarrow>=0.8.0 (this includes pyarrow==0.11.1).
> Please find below a quick repro:
> {code:python}
> import numpy as np
> import pyarrow as pa
> vec = np.array(["toto", "tata"])
> pa.array(vec, pa.string())
> {code}
> Runing this I get the following:
> {code:python}
> ---------------------------------------------------------------------------
> ArrowInvalid Traceback (most recent call last)
> <ipython-input-4-e753fb3a8193> in <module>()
> ----> 1 pa.array(vec, pa.string())
> /usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in pyarrow.lib.array()
> /usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in
> pyarrow.lib._ndarray_to_array()
> /usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in
> pyarrow.lib.check_status()
> ArrowInvalid: 'utf32' codec can't decode bytes in position 0-3: code point
> not in range(0x110000)
> {code}
> However, this code snippet was working fine with pyarrow==0.7.1.
> Was there any behavior change with string in pyarrow since 0.7.1?
> Do you have any workaround for this?
> Jacques
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)