tadeja commented on PR #50045:
URL: https://github.com/apache/arrow/pull/50045#issuecomment-4634069131
Root cause + deterministic repro
Scanned the suite with Opus and found the "polluter" is
`parquet/test_dataset.py`. Its dataset/filter tests leave small cyclic buffers
(64–768 B) reclaimed only gc.collect(), which accumulate in the xdist worker.
Occasionally a remnant survives auto-GC into `test_table_uses_memory_pool`.
THere gc.collect() frees it so final total drops below the baseline at start of
the test causing AssertionError in CI. Since pa.total_allocated_bytes() is
process-global, the fix is to take the baseline *after* a gc.collect(), same as
`test_array_uses_memory_pool` and #44793.
Repro (disabling auto-GC just removes the rarity, mechanism is unchanged)
File nogc.py:
```
import gc
def pytest_configure(config): gc.disable()
```
```bash
python -m pytest -n0 -q -p nogc \
pyarrow/tests/parquet/test_dataset.py \
pyarrow/tests/test_pandas.py::test_table_uses_memory_pool
```
--
On main (no fix):
```bash
FAILED ../pyarrow/tests/test_pandas.py::test_table_uses_memory_pool - assert
0 == 4736
1 failed, 56 passed, 6 skipped, 1 xfailed in 0.75s
```
With this PR fix (gc.collect() also before the baseline) makes baseline 0
and tests pass:
`57 passed, 6 skipped, 1 xfailed in 0.66s`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]