jorisvandenbossche commented on issue #33765: URL: https://github.com/apache/arrow/issues/33765#issuecomment-1406325427
Summary of my findings up to now. The **warning** comes from https://github.com/python/cpython/issues/95324 (PR: https://github.com/python/cpython/pull/95325). This warning was added to Python 3.11 in debug mode to help people catch bugs when an object doesn't call `PyObject_GC_UnTrack` during deallocation. This is only relevant for objects that support garbage collection (have the `Py_TPFLAGS_HAVE_GC` flag set). For such types, the destructor should call [PyObject_GC_UnTrack()](https://docs.python.org/3/c-api/gcsupport.html#c.PyObject_GC_UnTrack) before clearing any member fields. Now, for our user facing classes like Array, this dealloc is generated by cython, and cython makes any class that has a PyObject member in their own PyObject struct as "needs to participate in GC" (adds the `Py_TPFLAGS_HAVE_GC`). Our `Array` class is an example of that, because it has a python object `_name` attribute. The `tp_dealloc` slot that cython generates for `Array` looks like: ```c++ static void __pyx_tp_dealloc_7pyarrow_3lib_Array(PyObject *o) { struct __pyx_obj_7pyarrow_3lib_Array *p = (struct __pyx_obj_7pyarrow_3lib_Array *)o; #if CYTHON_USE_TP_FINALIZE if (unlikely(PyType_HasFeature(Py_TYPE(o), Py_TPFLAGS_HAVE_FINALIZE) && Py_TYPE(o)->tp_finalize) && !_PyGC_FINALIZED(o)) { if (PyObject_CallFinalizerFromDealloc(o)) return; } #endif PyObject_GC_UnTrack(o); __Pyx_call_destructor(p->sp_array); Py_CLEAR(p->type); Py_CLEAR(p->_name); #if CYTHON_USE_TYPE_SLOTS if (PyType_IS_GC(Py_TYPE(o)->tp_base)) #endif PyObject_GC_Track(o); __pyx_tp_dealloc_7pyarrow_3lib__Weakrefable(o); } ``` So cython correctly includes a `PyObject_GC_UnTrack` ([generating code](https://github.com/cython/cython/blob/b2cad768499cf5763251fe757491e9fc24a1f583/Cython/Compiler/ModuleNode.py#L1703-L1707)). It does again track GC before in the end calling the dealloc of its `_Weakrefable` base class. But this base class itself is _not_ a GC enabled class (it does not have GC tp_flag set, because cython makes an explicit exception for `__weakref__` on the rule of python object attributes triggering GC for that class). I am not sure what then causes the warning to be raised. Should cython not track GC again before calling dealloc of Weakrefable? Or is that a "bug" in the warning upstream in cpython that it should not raise the warning in the dealloc of the base class without GC of a class with GC? --- The **crash** seems to be related in the sense that code that is not expecting an exception is now raising one. For that code snippet, I get the traceback: ``` >>> import pyarrow as pa >>> table = pa.table({0: [1]}) Fatal Python error: _Py_CheckFunctionResult: a function returned a result with an exception set Python runtime state: initialized Traceback (most recent call last): File "pyarrow/table.pxi", line 5255, in pyarrow.lib._from_pydict File "pyarrow/table.pxi", line 3657, in pyarrow.lib.Table.from_arrays File "pyarrow/table.pxi", line 1406, in pyarrow.lib._sanitize_arrays File "pyarrow/table.pxi", line 1394, in pyarrow.lib._schema_from_arrays File "stringsource", line 15, in string.from_py.__pyx_convert_string_from_py_std__in_string TypeError: expected bytes, int found The above exception was the direct cause of the following exception: SystemError: <class 'ResourceWarning'> returned a result with an exception set ``` My understanding is that our code raises an error (because we can't convert the `0` key to a string column name, "expected bytes, int found"). Before pyarrow/cython actually raises this exception, some variables are cleaned up (including the array for the column), triggering the new code to check for tracked GC objects, but then when cpython wants to raise the warning about it, this fails because exception already has been set by pyarrow/cython (but not sure if cpython should clear/reset the exceptions when raising the warning?). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
