Hi, I'm using the the parquet-cpp library to read in some parquet files. I seen that the parquet-cpp library has support for arrow and hence I thought of giving it a shot. When running experiments I did not see any significant increase in performance hence I was taking a look at the code. It looks to me like the arrow reader uses and intermediate buffer to store the data and hence does an extra copy, is this because of the mismatch in data types between parquet and arrow? I'm specifically refering to the FlatColumnReader::Impl::ReadNullableFlatBatch method in [1] (line 276). Also I would imagine that setting one bit at a time would be inefficient, not too sure if the compiler would be smart enough to set a work at a time (I doubt it though). Just wondering if there was a reason behind having the code as it is.
[1] https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/reader.cc Regards, Keith. http://keith-chapman.com
