Kontinuation opened a new pull request, #5735:
URL: https://github.com/apache/iceberg/pull/5735

   This patch fixes native memory leaks in the vectorized reader for parquet, 
which affects both iceberg-core and iceberg-spark:
   
   1. Arrow `BufferAllocator` will be closed when the vectorized reader was 
closed, which frees native memory it possesses. The buffer allocator will also 
check for leaked memory on closing.
   2. We've also fixed memory leaks when reading parquet files containing 
interleaving plain/dictionary pages. Arrow buffer allocator detected this 
problem and thrown `IllegalStateException` when running the newly added test 
without this fix:
   
   ```
   org.apache.iceberg.arrow.vectorized.ArrowReaderTest > 
testInterleavingPlainAndDictionaryPages FAILED
       java.lang.IllegalStateException: Allocator[ArrowBatchReader] closed with 
outstanding buffers allocated (6).
       Allocator(ArrowBatchReader) 0/6400/9472/9223372036854775807 
(res/actual/peak/limit)
         child allocators: 0
         ledgers: 6
           ledger[1924] allocator: ArrowBatchReader), isOwning: , size: , 
references: 2, life: 22143564008682936..0, allocatorManager: [, life: ] holds 3 
buffers. 
               ArrowBuf[5225], address:140478431376008, length:8
               ArrowBuf[5224], address:140478431375888, length:120
               ArrowBuf[5223], address:140478431375888, length:128
           ledger[1923] allocator: ArrowBatchReader), isOwning: , size: , 
references: 2, life: 22143564003725139..0, allocatorManager: [, life: ] holds 3 
buffers. 
               ArrowBuf[5220], address:140478431392784, length:1024
               ArrowBuf[5222], address:140478431393776, length:32
               ArrowBuf[5221], address:140478431392784, length:992
           ledger[1920] allocator: ArrowBatchReader), isOwning: , size: , 
references: 2, life: 22143563974533216..0, allocatorManager: [, life: ] holds 3 
buffers. 
               ArrowBuf[5215], address:140478431392752, length:32
               ArrowBuf[5214], address:140478431391760, length:992
               ArrowBuf[5213], address:140478431391760, length:1024
           ledger[1922] allocator: ArrowBatchReader), isOwning: , size: , 
references: 1, life: 22143564003418541..0, allocatorManager: [, life: ] holds 1 
buffers. 
               ArrowBuf[5219], address:140478431385616, length:2048
           ledger[1921] allocator: ArrowBatchReader), isOwning: , size: , 
references: 2, life: 22143563988827959..0, allocatorManager: [, life: ] holds 3 
buffers. 
               ArrowBuf[5218], address:140478431375880, length:8
               ArrowBuf[5217], address:140478431375760, length:120
               ArrowBuf[5216], address:140478431375760, length:128
           ledger[1919] allocator: ArrowBatchReader), isOwning: , size: , 
references: 1, life: 22143563974111982..0, allocatorManager: [, life: ] holds 1 
buffers. 
               ArrowBuf[5212], address:140478431383568, length:2048
         reservations: 0
           at 
org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:405)
           at 
org.apache.iceberg.arrow.vectorized.ArrowBatchReader.close(ArrowBatchReader.java:66)
           at 
org.apache.iceberg.parquet.VectorizedParquetReader$FileIterator.close(VectorizedParquetReader.java:176)
           at 
org.apache.iceberg.arrow.vectorized.ArrowReader$VectorizedCombinedScanIterator.hasNext(ArrowReader.java:299)
           at 
org.apache.iceberg.arrow.vectorized.ArrowReaderTest.testInterleavingPlainAndDictionaryPages(ArrowReaderTest.java:347)
   ```
   
   Notice: this fix may break existing workflows since `IllegalStateException` 
will be raised when any memory leak problems lurking in the vectorized readers 
get triggered.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to