annimesh2809 opened a new pull request, #3357: URL: https://github.com/apache/parquet-java/pull/3357
<!-- Thanks for opening a pull request! If you're new to Parquet-Java, information on how to contribute can be found here: https://parquet.apache.org/docs/contribution-guidelines/contributing Please open a GitHub issue for this pull request: https://github.com/apache/parquet-java/issues/new/choose and format pull request title as below: GH-${GITHUB_ISSUE_ID}: ${SUMMARY} or simply use the title below if it is a minor issue: MINOR: ${SUMMARY} --> ### Rationale for this change A couple of test cases of TestParquetReader suite started failing with errors like: ``` - testRangeFiltering[0] *** FAILED *** org.apache.parquet.bytes.TrackingByteBufferAllocator$LeakedByteBufferException: 24 ByteBuffer object(s) is/are remained unreleased after closing this allocator. at org.apache.parquet.bytes.TrackingByteBufferAllocator.close(TrackingByteBufferAllocator.java:161) at org.apache.parquet.hadoop.TestParquetReader.closeAllocator(TestParquetReader.java:175) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) ``` with hadoop 3.4.2 and parquet-mr 1.16.0 The leaks are happening when reading using vectored IO because we never pass the buffers to the releaser. ### What changes are included in this PR? Added a custom allocator that adds all allocated buffers to the releaser. This way all classes using the allocator for allocating buffers (like `ChecksumFileSystem`) will also be cleaned up. ### Are these changes tested? TestParquetReader suite passes with these changes. ### Are there any user-facing changes? No <!-- Please uncomment the line below and replace ${GITHUB_ISSUE_ID} with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
