tadeja commented on PR #50045:
URL: https://github.com/apache/arrow/pull/50045#issuecomment-4634069131

   Root cause + deterministic repro
   
   Scanned the suite with Opus and found the "polluter" is 
`parquet/test_dataset.py`. Its dataset/filter tests leave small cyclic buffers 
(64–768 B) reclaimed only gc.collect(), which accumulate in the xdist worker. 
Occasionally a remnant survives auto-GC into `test_table_uses_memory_pool`. 
THere gc.collect() frees it so final total drops below the baseline at start of 
the test causing AssertionError in CI. Since pa.total_allocated_bytes() is 
process-global, the fix is to take the baseline *after* a gc.collect(), same as 
`test_array_uses_memory_pool` and #44793.
   
   Repro (disabling auto-GC just removes the rarity, mechanism is unchanged)
   File nogc.py:
   ```
   import gc
   def pytest_configure(config): gc.disable()
   ```
   ```bash
   python -m pytest -n0 -q -p nogc \
     pyarrow/tests/parquet/test_dataset.py \
     pyarrow/tests/test_pandas.py::test_table_uses_memory_pool
   ```
   --
   On main (no fix):
   ```bash
   FAILED ../pyarrow/tests/test_pandas.py::test_table_uses_memory_pool - assert 
0 == 4736
   1 failed, 56 passed, 6 skipped, 1 xfailed in 0.75s
   ```
   With this PR fix (gc.collect() also before the baseline) makes baseline 0 
and tests pass:
   `57 passed, 6 skipped, 1 xfailed in 0.66s`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to