[PARQUET_CPP] Reading consecutive columns is inefficient

Keith Chapman Tue, 20 Dec 2016 16:19:25 -0800

Hi,

The java API of ParquetFileReader [1] (line 684) reads a row group as whole
into memory while the cpp API reads in a column at a time even if the
columns are consecutive. This causes multiple calls to seek and read and
can be inefficient when reading over a network. Are there any plans to
expand the cpp API to have the functionality to read a whole row group at
once (only the relevant columns as the java API)


Regards,
Keith.

[1]
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java

[PARQUET_CPP] Reading consecutive columns is inefficient

Reply via email to