I'm looking at Impala bug with decoding Parquet RLE with run lengths >= 2^31. The bug was found by fuzz testing rather than a realistic file. I'm trying to determine whether the Parquet spec actually allows runs of that length, but Encodings.md does not seem to specify any upper bound. It mentions ULEB128 encoding, but that can encode arbitrarily large numbers. See https://github.com/apache/parquet-format/blob/master/Encodings.md#run-length-encoding--bit-packing-hybrid-rle--3
Is there a practical limit I can assume? Should we amend the Parquet spec to clarify this? The Impala bug is https://issues.apache.org/jira/browse/IMPALA-6946 if anyone is curious. Thanks, Tim
