Cool, Thanks for the update Wes. I was wondering if there was some deign issue I was not aware of :). I will keep my eyes on the PR and llok to make more optimizations and upstream it.
Regards, Keith. http://keith-chapman.com On Thu, Jan 12, 2017 at 5:15 PM, Wes McKinney <[email protected]> wrote: > hi Keith > > Uwe is working on this right now (avoiding the extra copy): > > https://github.com/apache/parquet-cpp/pull/218 > > We would appreciate any efforts to further optimize these code paths. > > Thanks > Wes > > On Thu, Jan 12, 2017 at 7:21 PM, Keith Chapman <[email protected]> > wrote: > > Hi, > > > > I'm using the the parquet-cpp library to read in some parquet files. I > seen > > that the parquet-cpp library has support for arrow and hence I thought of > > giving it a shot. When running experiments I did not see any significant > > increase in performance hence I was taking a look at the code. It looks > to > > me like the arrow reader uses and intermediate buffer to store the data > and > > hence does an extra copy, is this because of the mismatch in data types > > between parquet and arrow? I'm specifically refering to the > > FlatColumnReader::Impl::ReadNullableFlatBatch method in [1] (line 276). > > Also I would imagine that setting one bit at a time would be inefficient, > > not too sure if the compiler would be smart enough to set a work at a > time > > (I doubt it though). Just wondering if there was a reason behind having > the > > code as it is. > > > > [1] > > https://github.com/apache/parquet-cpp/blob/master/src/ > parquet/arrow/reader.cc > > > > > > Regards, > > Keith. > > > > http://keith-chapman.com >
