[
https://issues.apache.org/jira/browse/ARROW-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276544#comment-16276544
]
ASF GitHub Bot commented on ARROW-1863:
---------------------------------------
xhochy closed pull request #1385: ARROW-1863: [Python] PyObjectStringify could
render bytes-like output for more types of objects
URL: https://github.com/apache/arrow/pull/1385
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git a/cpp/src/arrow/python/builtin_convert.cc
b/cpp/src/arrow/python/builtin_convert.cc
index fa0098bdf..c716c47d2 100644
--- a/cpp/src/arrow/python/builtin_convert.cc
+++ b/cpp/src/arrow/python/builtin_convert.cc
@@ -667,7 +667,8 @@ class UTF8Converter : public
TypedConverterVisitor<StringBuilder, UTF8Converter>
RETURN_IF_PYERROR();
bytes_obj = obj;
} else if (!PyUnicode_Check(obj)) {
- PyObjectStringify stringified(obj);
+ OwnedRef repr(PyObject_Repr(obj));
+ PyObjectStringify stringified(repr.obj());
std::stringstream ss;
ss << "Non bytes/unicode value encountered: " << stringified.bytes;
return Status::Invalid(ss.str());
diff --git a/python/pyarrow/tests/test_convert_builtin.py
b/python/pyarrow/tests/test_convert_builtin.py
index 4c3d9e563..d7760da2f 100644
--- a/python/pyarrow/tests/test_convert_builtin.py
+++ b/python/pyarrow/tests/test_convert_builtin.py
@@ -312,6 +312,13 @@ def test_mixed_types_fails(self):
with self.assertRaises(pa.ArrowException):
pa.array(data)
+ def test_mixed_types_with_specified_type_fails(self):
+ data = ['-10', '-5', {'a': 1}, '0', '5', '10']
+
+ type = pa.string()
+ with self.assertRaises(pa.ArrowInvalid):
+ pa.array(data, type=type)
+
def test_decimal(self):
data = [decimal.Decimal('1234.183'), decimal.Decimal('8094.234')]
type = pa.decimal128(precision=7, scale=3)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [Python] PyObjectStringify could render bytes-like output for more types of
> objects
> -----------------------------------------------------------------------------------
>
> Key: ARROW-1863
> URL: https://issues.apache.org/jira/browse/ARROW-1863
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Reporter: Xianjin YE
> Assignee: Phillip Cloud
> Labels: pull-request-available
> Fix For: 0.8.0
>
>
> PyObjectStringify doesn't handle non-string(bytes or utf-8) type correctly.
> Should use PyObject_Repr(or PyObject_Str) to get string representation of
> PyObject.
> {code:java}
> struct ARROW_EXPORT PyObjectStringify {
> OwnedRef tmp_obj;
> const char* bytes;
> Py_ssize_t size;
> explicit PyObjectStringify(PyObject* obj) {
> PyObject* bytes_obj;
> if (PyUnicode_Check(obj)) {
> bytes_obj = PyUnicode_AsUTF8String(obj);
> tmp_obj.reset(bytes_obj);
> bytes = PyBytes_AsString(bytes_obj);
> size = PyBytes_GET_SIZE(bytes_obj);
> } else if (PyBytes_Check(obj)) {
> bytes = PyBytes_AsString(obj);
> size = PyBytes_GET_SIZE(obj);
> } else {
> bytes = NULLPTR;
> size = -1;
> }
> }
> };
> {code}
> should change to
> {code:java}
> struct ARROW_EXPORT PyObjectStringify {
> OwnedRef tmp_obj;
> const char* bytes;
> Py_ssize_t size;
> explicit PyObjectStringify(PyObject* obj) {
> PyObject* bytes_obj;
> if (PyUnicode_Check(obj)) {
> bytes_obj = PyUnicode_AsUTF8String(obj);
> tmp_obj.reset(bytes_obj);
> bytes = PyBytes_AsString(bytes_obj);
> size = PyBytes_GET_SIZE(bytes_obj);
> } else if (PyBytes_Check(obj)) {
> bytes = PyBytes_AsString(obj);
> size = PyBytes_GET_SIZE(obj);
> } else {
> bytes_obj = PyObject_Repr(obj);
> tmp_obj.reset(bytes_obj);
> bytes = PyBytes_AsString(bytes_obj);
> size = PyBytes_GET_SIZE(bytes_obj);
> }
> }
> };
> {code}
> How do this infect pyarrow? Minimal reproduction case:
> {code:java}
> import pyarrow
> data = ['-10', '-5', {'a': 1}, '0', '5', '10']
> arr = pyarrow.array(data, type=pyarrow.string())
> [1] 64491 segmentation fault ipython
> {code}
> This case is found by my colleague. I would ask him to send a pr here.
> cc [~wesmckinn]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)