Hi, The parquet is divided into arrow and parquet part.
1. The parquet part lowest position is parquet decoder, in [1]. The float point might choosing PLAIN, RLE_DCIT or BYTE_STREAM_SPLIT encoding. 2. parquet::ColumnReader is applied beyond decoder, each row-group might have one or two ( if choosing dictionary encoding and fall-back to plain, there're two encoding in a RowGroup for a column). This is in [2] Other modules are mentioned by Bryce. Best, Xuwei Fu [1] https://github.com/apache/arrow/blob/main/cpp/src/parquet/encoding.cc [2] https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc Li Jin <ice.xell...@gmail.com> 于2023年11月18日周六 05:27写道: > Hi, > > I am recently investigating a null/nan issue with Parquet and Arrow and > wonder if someone can give me a pointer to the code that decodes Parquet > row group into Arrow float/double arrays? > > Thanks, > Li >