Kontinuation opened a new pull request, #5735:
URL: https://github.com/apache/iceberg/pull/5735
This patch fixes native memory leaks in the vectorized reader for parquet,
which affects both iceberg-core and iceberg-spark:
1. Arrow `BufferAllocator` will be closed when the vectorized reader was
closed, which frees native memory it possesses. The buffer allocator will also
check for leaked memory on closing.
2. We've also fixed memory leaks when reading parquet files containing
interleaving plain/dictionary pages. Arrow buffer allocator detected this
problem and thrown `IllegalStateException` when running the newly added test
without this fix:
```
org.apache.iceberg.arrow.vectorized.ArrowReaderTest >
testInterleavingPlainAndDictionaryPages FAILED
java.lang.IllegalStateException: Allocator[ArrowBatchReader] closed with
outstanding buffers allocated (6).
Allocator(ArrowBatchReader) 0/6400/9472/9223372036854775807
(res/actual/peak/limit)
child allocators: 0
ledgers: 6
ledger[1924] allocator: ArrowBatchReader), isOwning: , size: ,
references: 2, life: 22143564008682936..0, allocatorManager: [, life: ] holds 3
buffers.
ArrowBuf[5225], address:140478431376008, length:8
ArrowBuf[5224], address:140478431375888, length:120
ArrowBuf[5223], address:140478431375888, length:128
ledger[1923] allocator: ArrowBatchReader), isOwning: , size: ,
references: 2, life: 22143564003725139..0, allocatorManager: [, life: ] holds 3
buffers.
ArrowBuf[5220], address:140478431392784, length:1024
ArrowBuf[5222], address:140478431393776, length:32
ArrowBuf[5221], address:140478431392784, length:992
ledger[1920] allocator: ArrowBatchReader), isOwning: , size: ,
references: 2, life: 22143563974533216..0, allocatorManager: [, life: ] holds 3
buffers.
ArrowBuf[5215], address:140478431392752, length:32
ArrowBuf[5214], address:140478431391760, length:992
ArrowBuf[5213], address:140478431391760, length:1024
ledger[1922] allocator: ArrowBatchReader), isOwning: , size: ,
references: 1, life: 22143564003418541..0, allocatorManager: [, life: ] holds 1
buffers.
ArrowBuf[5219], address:140478431385616, length:2048
ledger[1921] allocator: ArrowBatchReader), isOwning: , size: ,
references: 2, life: 22143563988827959..0, allocatorManager: [, life: ] holds 3
buffers.
ArrowBuf[5218], address:140478431375880, length:8
ArrowBuf[5217], address:140478431375760, length:120
ArrowBuf[5216], address:140478431375760, length:128
ledger[1919] allocator: ArrowBatchReader), isOwning: , size: ,
references: 1, life: 22143563974111982..0, allocatorManager: [, life: ] holds 1
buffers.
ArrowBuf[5212], address:140478431383568, length:2048
reservations: 0
at
org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:405)
at
org.apache.iceberg.arrow.vectorized.ArrowBatchReader.close(ArrowBatchReader.java:66)
at
org.apache.iceberg.parquet.VectorizedParquetReader$FileIterator.close(VectorizedParquetReader.java:176)
at
org.apache.iceberg.arrow.vectorized.ArrowReader$VectorizedCombinedScanIterator.hasNext(ArrowReader.java:299)
at
org.apache.iceberg.arrow.vectorized.ArrowReaderTest.testInterleavingPlainAndDictionaryPages(ArrowReaderTest.java:347)
```
Notice: this fix may break existing workflows since `IllegalStateException`
will be raised when any memory leak problems lurking in the vectorized readers
get triggered.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]