The current API for writing repetition and definition levels takes arrays of int16_t values for the levels. This seems inefficient when there are "runs" of the same level. When there are runs, writers that don't already have the data in array form, need to explode it out to an array which ultimately gets run-length encoded again via RLEEncoder [1]. A similar inefficiency exists when reading.
I was wondering if the community has considered adding APIs to the column_writer/reader [2][3] that expose the encoder/decoder directly or a facade around them? At least for encoding it seems like an extra wrapper for RLEEncoder would be needed since the required memory size wouldn't be known up front. Thanks, Micah [1] https://github.com/apache/arrow/blob/8f61c7997e55612940bff4be7b043f0ee61bf238/cpp/src/arrow/util/rle_encoding.h [2] https://github.com/apache/arrow/blob/8f61c7997e55612940bff4be7b043f0ee61bf238/cpp/src/parquet/column_writer.h#L158 [3] https://github.com/apache/arrow/blob/8f61c7997e55612940bff4be7b043f0ee61bf238/cpp/src/parquet/column_reader.h#L151
