Cool, thanks Uwe. Wil try using it within the coming days. Regards, Keith.
http://keith-chapman.com On Tue, Jan 17, 2017 at 11:44 PM, Uwe L. Korn <[email protected]> wrote: > Hi Keith, > > just a small heads up: the pull request for the read path is merged, I'm > currently looking into removing all those copies in the write as well. > > Cheers > Uwe > > On Fri, Jan 13, 2017, at 02:20 AM, Keith Chapman wrote: > > Cool, Thanks for the update Wes. I was wondering if there was some deign > > issue I was not aware of :). I will keep my eyes on the PR and llok to > > make > > more optimizations and upstream it. > > > > Regards, > > Keith. > > > > http://keith-chapman.com > > > > On Thu, Jan 12, 2017 at 5:15 PM, Wes McKinney <[email protected]> > > wrote: > > > > > hi Keith > > > > > > Uwe is working on this right now (avoiding the extra copy): > > > > > > https://github.com/apache/parquet-cpp/pull/218 > > > > > > We would appreciate any efforts to further optimize these code paths. > > > > > > Thanks > > > Wes > > > > > > On Thu, Jan 12, 2017 at 7:21 PM, Keith Chapman < > [email protected]> > > > wrote: > > > > Hi, > > > > > > > > I'm using the the parquet-cpp library to read in some parquet files. > I > > > seen > > > > that the parquet-cpp library has support for arrow and hence I > thought of > > > > giving it a shot. When running experiments I did not see any > significant > > > > increase in performance hence I was taking a look at the code. It > looks > > > to > > > > me like the arrow reader uses and intermediate buffer to store the > data > > > and > > > > hence does an extra copy, is this because of the mismatch in data > types > > > > between parquet and arrow? I'm specifically refering to the > > > > FlatColumnReader::Impl::ReadNullableFlatBatch method in [1] (line > 276). > > > > Also I would imagine that setting one bit at a time would be > inefficient, > > > > not too sure if the compiler would be smart enough to set a work at a > > > time > > > > (I doubt it though). Just wondering if there was a reason behind > having > > > the > > > > code as it is. > > > > > > > > [1] > > > > https://github.com/apache/parquet-cpp/blob/master/src/ > > > parquet/arrow/reader.cc > > > > > > > > > > > > Regards, > > > > Keith. > > > > > > > > http://keith-chapman.com > > > >
