This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git
The following commit(s) were added to refs/heads/master by this push:
new 042d725888 Avoid infinite loop in bad parquet by checking the number
of rep levels (#6232)
042d725888 is described below
commit 042d725888358c73cd2a0d58868ea5c4bad778f7
Author: Jinpeng <[email protected]>
AuthorDate: Thu Aug 15 15:13:00 2024 -0700
Avoid infinite loop in bad parquet by checking the number of rep levels
(#6232)
* check the number of rep levels read from page
* minor fix on typo
Co-authored-by: Andrew Lamb <[email protected]>
* add check on record_read as well
---------
Co-authored-by: jp0317 <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
---
parquet/src/column/reader.rs | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/parquet/src/column/reader.rs b/parquet/src/column/reader.rs
index b40ca2b782..0c7cbb412a 100644
--- a/parquet/src/column/reader.rs
+++ b/parquet/src/column/reader.rs
@@ -240,6 +240,12 @@ where
let (mut records_read, levels_read) =
reader.read_rep_levels(out, remaining_records,
remaining_levels)?;
+ if records_read == 0 && levels_read == 0 {
+ // The fact that we're still looping implies there
must be some levels to read.
+ return Err(general_err!(
+ "Insufficient repetition levels read from column"
+ ));
+ }
if levels_read == remaining_levels &&
self.has_record_delimiter {
// Reached end of page, which implies records_read <
remaining_records
// as otherwise would have stopped reading before
reaching the end