wangyum opened a new pull request, #3250: URL: https://github.com/apache/parquet-java/pull/3250
### Rationale for this change When reading Bloom filter data from files with older versions(< Parquet 1.13), the code uses `in.read(bitset)` to read the bitset data. However, `InputStream.read(byte[])` doesn't guarantee reading all requested bytes in a single call - it may read fewer bytes than the buffer size and the remaining portion of the buffer stays uninitialized. This can lead to incorrect Bloom filter behavior as parts of the bitset might be missing or contain zeros instead of the actual data. ### What changes are included in this PR? This PR modifies the logic to properly ensure all bytes are read from the input stream: For older file versions (negative bloomFilterLength), we continue using f.readFully(bitset) For newer file versions (positive bloomFilterLength), we still use `in.read(bitset)`. ### Are these changes tested? Manual testing. ### Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
