Hi,

The parquet is divided into arrow and parquet part.

1. The parquet part lowest position is parquet decoder, in [1].
    The float point might choosing PLAIN, RLE_DCIT or BYTE_STREAM_SPLIT
    encoding.
2. parquet::ColumnReader is applied beyond decoder, each row-group might
have
    one or two ( if choosing dictionary encoding and fall-back to plain,
there're
    two encoding in a RowGroup for a column). This is in [2]

Other modules are mentioned by Bryce.

Best,
Xuwei Fu

[1] https://github.com/apache/arrow/blob/main/cpp/src/parquet/encoding.cc
[2]
https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc

Li Jin <ice.xell...@gmail.com> 于2023年11月18日周六 05:27写道:

> Hi,
>
> I am recently investigating a null/nan issue with Parquet and Arrow and
> wonder if someone can give me a pointer to the code that decodes Parquet
> row group into Arrow float/double arrays?
>
> Thanks,
> Li
>

Reply via email to