The
test 
TestByteBitPacking512VectorLE.unpackValuesUsingVectorBitWidth(TestByteBitPacking512VectorLE
is flaky in the Parquet github PR testing environment [1].

I gave the error to Codex (the OpenAI coding agent) and asked it to fix the
test.  However, since I don't have enough confidence in my own
understanding of the problem or the fix, I have not opened a PR.  The fix
can be found on my fork here
<https://github.com/dossett/parquet-java/commit/7635c8599524aadee1164fc2168801c51390b118>
.

The codex summary of the problem and the fix is this:

We addressed CI OOMs in TestByteBitPacking512VectorLE
(parquet-encoding-vector) by bounding the test input size while keeping the
same correctness coverage. The original getRangeData could allocate arrays
on the order of hundreds of millions of ints per bit width, which can
consume tens of GB of heap and fail in constrained CI environments.

The updated test generates a single bounded dataset (min 64, max 2^20
values) and spans the full legal value range for each bit width (including
the full signed int range for 32‑bit).  The vector and scalar pack/unpack
paths are still compared for equality across bit widths, but without the
unbounded memory stress that was causing flakiness.

I would appreciate any feedback on that or alternatively other ways to
address the flaky test, I found it very frustrating recently when I was
opening several PRs.

Cheers, Aaron

[1] Example failure:
https://github.com/apache/parquet-java/actions/runs/20671204311/job/59352228516?pr=3385

-- 
Aaron Niskode-Dossett, Data Engineering -- Etsy

Reply via email to