hi Keith Uwe is working on this right now (avoiding the extra copy):
https://github.com/apache/parquet-cpp/pull/218 We would appreciate any efforts to further optimize these code paths. Thanks Wes On Thu, Jan 12, 2017 at 7:21 PM, Keith Chapman <[email protected]> wrote: > Hi, > > I'm using the the parquet-cpp library to read in some parquet files. I seen > that the parquet-cpp library has support for arrow and hence I thought of > giving it a shot. When running experiments I did not see any significant > increase in performance hence I was taking a look at the code. It looks to > me like the arrow reader uses and intermediate buffer to store the data and > hence does an extra copy, is this because of the mismatch in data types > between parquet and arrow? I'm specifically refering to the > FlatColumnReader::Impl::ReadNullableFlatBatch method in [1] (line 276). > Also I would imagine that setting one bit at a time would be inefficient, > not too sure if the compiler would be smart enough to set a work at a time > (I doubt it though). Just wondering if there was a reason behind having the > code as it is. > > [1] > https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/reader.cc > > > Regards, > Keith. > > http://keith-chapman.com
