Csaba Ringhofer created PARQUET-1250:
----------------------------------------
Summary: RLE decoding should treat 0 length runs as error
Key: PARQUET-1250
URL: https://issues.apache.org/jira/browse/PARQUET-1250
Project: Parquet
Issue Type: Improvement
Components: parquet-mr
Reporter: Csaba Ringhofer
RunLengthBitPackingHybridDecoder accepts run headers that encode 0 length
repeated runs, and treats them as if they were 2^32 length run, so effectively
every value returned for that data page will be the same. (see
https://github.com/apache/parquet-mr/blob/0a86429939075984edce5e3b8195dfb7f9e3ab6b/parquet-column/src/main/java/org/apache/parquet/column/values/rle/RunLengthBitPackingHybridDecoder.java#L66
)
Throwing an exception if count is 0 would give a proper error message for some
corrupt files, and would make it clear that these are not legal values.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)