jorisvandenbossche commented on issue #33765:
URL: https://github.com/apache/arrow/issues/33765#issuecomment-1406325427

   Summary of my findings up to now.
   
   The **warning** comes from https://github.com/python/cpython/issues/95324 
(PR: https://github.com/python/cpython/pull/95325). This warning was added to 
Python 3.11 in debug mode to help people catch bugs when an object doesn't call 
`PyObject_GC_UnTrack` during deallocation.
   
   This is only relevant for objects that support garbage collection (have the 
`Py_TPFLAGS_HAVE_GC` flag set). For such types, the destructor should call 
[PyObject_GC_UnTrack()](https://docs.python.org/3/c-api/gcsupport.html#c.PyObject_GC_UnTrack)
 before clearing any member fields. 
   Now, for our user facing classes like Array, this dealloc is generated by 
cython, and cython makes any class that has a PyObject member in their own 
PyObject struct as "needs to participate in GC" (adds the 
`Py_TPFLAGS_HAVE_GC`). Our `Array` class is an example of that, because it has 
a python object `_name` attribute. The `tp_dealloc` slot that cython generates 
for `Array` looks like:
   
   ```c++
   static void __pyx_tp_dealloc_7pyarrow_3lib_Array(PyObject *o) {
     struct __pyx_obj_7pyarrow_3lib_Array *p = (struct 
__pyx_obj_7pyarrow_3lib_Array *)o;
     #if CYTHON_USE_TP_FINALIZE
     if (unlikely(PyType_HasFeature(Py_TYPE(o), Py_TPFLAGS_HAVE_FINALIZE) && 
Py_TYPE(o)->tp_finalize) && !_PyGC_FINALIZED(o)) {
       if (PyObject_CallFinalizerFromDealloc(o)) return;
     }
     #endif
     PyObject_GC_UnTrack(o);
     __Pyx_call_destructor(p->sp_array);
     Py_CLEAR(p->type);
     Py_CLEAR(p->_name);
     #if CYTHON_USE_TYPE_SLOTS
     if (PyType_IS_GC(Py_TYPE(o)->tp_base))
     #endif
     PyObject_GC_Track(o);
     __pyx_tp_dealloc_7pyarrow_3lib__Weakrefable(o);
   }
   ```
   
   So cython correctly includes a `PyObject_GC_UnTrack` ([generating 
code](https://github.com/cython/cython/blob/b2cad768499cf5763251fe757491e9fc24a1f583/Cython/Compiler/ModuleNode.py#L1703-L1707)).
 It does again track GC before in the end calling the dealloc of its 
`_Weakrefable` base class. But this base class itself is _not_ a GC enabled 
class (it does not have GC tp_flag set, because cython makes an explicit 
exception for `__weakref__` on the rule of python object attributes triggering 
GC for that class). 
   
   I am not sure what then causes the warning to be raised. Should cython not 
track GC again before calling dealloc of Weakrefable? Or is that a "bug" in the 
warning upstream in cpython that it should not raise the warning in the dealloc 
of the base class without GC of a class with GC?
   
   ---
   
   The **crash** seems to be related in the sense that code that is not 
expecting an exception is now raising one. For that code snippet, I get the 
traceback:
   
   ```
   >>> import pyarrow as pa
   >>> table = pa.table({0: [1]})
   Fatal Python error: _Py_CheckFunctionResult: a function returned a result 
with an exception set
   Python runtime state: initialized
   Traceback (most recent call last):
     File "pyarrow/table.pxi", line 5255, in pyarrow.lib._from_pydict
     File "pyarrow/table.pxi", line 3657, in pyarrow.lib.Table.from_arrays
     File "pyarrow/table.pxi", line 1406, in pyarrow.lib._sanitize_arrays
     File "pyarrow/table.pxi", line 1394, in pyarrow.lib._schema_from_arrays
     File "stringsource", line 15, in 
string.from_py.__pyx_convert_string_from_py_std__in_string
   TypeError: expected bytes, int found
   
   The above exception was the direct cause of the following exception:
   
   SystemError: <class 'ResourceWarning'> returned a result with an exception 
set
   ```
   
   My understanding is that our code raises an error (because we can't convert 
the `0` key to a string column name, "expected bytes, int found"). Before 
pyarrow/cython actually raises this exception, some variables are cleaned up 
(including the array for the column), triggering the new code to check for 
tracked GC objects, but then when cpython wants to raise the warning about it, 
this fails because exception already has been set by pyarrow/cython (but not 
sure if cpython should clear/reset the exceptions when raising the warning?).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to