[
https://issues.apache.org/jira/browse/ARROW-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rok Mihevc updated ARROW-1863:
------------------------------
External issue URL: https://github.com/apache/arrow/issues/17856
> [Python] PyObjectStringify could render bytes-like output for more types of
> objects
> -----------------------------------------------------------------------------------
>
> Key: ARROW-1863
> URL: https://issues.apache.org/jira/browse/ARROW-1863
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Reporter: Xianjin YE
> Assignee: Phillip Cloud
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.8.0
>
>
> PyObjectStringify doesn't handle non-string(bytes or utf-8) type correctly.
> Should use PyObject_Repr(or PyObject_Str) to get string representation of
> PyObject.
> {code:java}
> struct ARROW_EXPORT PyObjectStringify {
> OwnedRef tmp_obj;
> const char* bytes;
> Py_ssize_t size;
> explicit PyObjectStringify(PyObject* obj) {
> PyObject* bytes_obj;
> if (PyUnicode_Check(obj)) {
> bytes_obj = PyUnicode_AsUTF8String(obj);
> tmp_obj.reset(bytes_obj);
> bytes = PyBytes_AsString(bytes_obj);
> size = PyBytes_GET_SIZE(bytes_obj);
> } else if (PyBytes_Check(obj)) {
> bytes = PyBytes_AsString(obj);
> size = PyBytes_GET_SIZE(obj);
> } else {
> bytes = NULLPTR;
> size = -1;
> }
> }
> };
> {code}
> should change to
> {code:java}
> struct ARROW_EXPORT PyObjectStringify {
> OwnedRef tmp_obj;
> const char* bytes;
> Py_ssize_t size;
> explicit PyObjectStringify(PyObject* obj) {
> PyObject* bytes_obj;
> if (PyUnicode_Check(obj)) {
> bytes_obj = PyUnicode_AsUTF8String(obj);
> tmp_obj.reset(bytes_obj);
> bytes = PyBytes_AsString(bytes_obj);
> size = PyBytes_GET_SIZE(bytes_obj);
> } else if (PyBytes_Check(obj)) {
> bytes = PyBytes_AsString(obj);
> size = PyBytes_GET_SIZE(obj);
> } else {
> bytes_obj = PyObject_Repr(obj);
> tmp_obj.reset(bytes_obj);
> bytes = PyBytes_AsString(bytes_obj);
> size = PyBytes_GET_SIZE(bytes_obj);
> }
> }
> };
> {code}
> How do this infect pyarrow? Minimal reproduction case:
> {code:java}
> import pyarrow
> data = ['-10', '-5', {'a': 1}, '0', '5', '10']
> arr = pyarrow.array(data, type=pyarrow.string())
> [1] 64491 segmentation fault ipython
> {code}
> This case is found by my colleague. I would ask him to send a pr here.
> cc [~wesmckinn]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)