[PR] Parquet: Fix readers crashing on 2-level (Thrift) list encoding [iceberg]

via GitHub Mon, 23 Mar 2026 14:34:28 -0700


josephperez3 opened a new pull request, #15747:
URL: https://github.com/apache/iceberg/pull/15747


   Closes #9497
   
   ## Problem
   
   PR #3774 added `visitTwoLevelList()` to correctly detect 2-level lists 
during schema traversal, but the `list()` methods in the reader builders were 
not updated. `visitTwoLevelList()` does not push the repeated field name onto 
the `fieldNames` stack, so `currentPath()` is one segment short. This produces 
`defLevel=0` and `repLevel=-1` instead of the correct values, causing 
`RepeatedReader` to loop indefinitely and crash with `ParquetDecodingException: 
Reading past RLE/BitPacking stream`.
   
   ## Fix
   
   In each `list()` method, detect 2-level lists via `isOldListElementType()` 
and compute def/rep levels from the element path instead of `currentPath()`. We 
can skip the `OptionReader` wrapping since elements are always non-null by 
nature of two level encoding.
   
   Fixed for both Flink and Spark read paths.
   
   ## Testing
   
   New tests in `TestSparkParquetReader` and `TestFlinkParquetReader`: write a 
2-level Parquet file which includes some empty lists and verifies each row. 
   
   Existing tests all pass, and without the fix, the new tests crash:
   ```
   ParquetDecodingException: Can't read value in column [names, names_tuple]
   repeated binary names_tuple (STRING) at value 7 out of 7 in current page.
   repetition level: 0, definition level: 2
   
   Caused by: could not read bytes at offset 40
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Parquet: Fix readers crashing on 2-level (Thrift) list encoding [iceberg]

Reply via email to