Hi,
The parquet is divided into arrow and parquet part.
1. The parquet part lowest position is parquet decoder, in [1].
The float point might choosing PLAIN, RLE_DCIT or BYTE_STREAM_SPLIT
encoding.
2. parquet::ColumnReader is applied beyond decoder, each row-group might
have
one or two ( if choosing dictionary encoding and fall-back to plain,
there're
two encoding in a RowGroup for a column). This is in [2]
Other modules are mentioned by Bryce.
Best,
Xuwei Fu
[1] https://github.com/apache/arrow/blob/main/cpp/src/parquet/encoding.cc
[2]
https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc
Li Jin <[email protected]> 于2023年11月18日周六 05:27写道:
> Hi,
>
> I am recently investigating a null/nan issue with Parquet and Arrow and
> wonder if someone can give me a pointer to the code that decodes Parquet
> row group into Arrow float/double arrays?
>
> Thanks,
> Li
>