hi Kai, Arrow C++ does not strictly depend on Parquet C++. I have been working on parquet-cpp (http://github.com/apache/parquet-cpp) and intend to create an optional Parquet-Arrow adapter library that links to libparquet.so and provides a read and write path for Arrow data. It's not 100% clear whether putting the adapter library in apache/parquet-cpp or apache/arrow makes more sense, if others have opinions.
The downside of putting the Arrow-Parquet C++ adapter code in apache/arrow is continuous integration -- building Thrift and the other parquet-cpp dependencies to run the unit tests might become onerous. That being said, I *do* need to be able to run unit tests for the PyArrow Parquet read/write path. I'm starting work on this in the next few days, in fact. - Wes On Mon, Mar 21, 2016 at 8:57 AM, Zheng, Kai <[email protected]> wrote: > Hi, > > By quick looking at the codes, it looks like Arrow is depending on Parquet, > however Parquet looks kinds of heavy for Arrow. Not sure what's the exact > part in Parquet Arrow is using. Not sure if the vice versa is better or not, > say in Parquet project, have a new reader that reads parquet data into an > Arrow representation, let Parquet depend on Arrow instead. I noticed there > was some effort (PARQUET-131) that reads parquet data into column vectors, > wonder if it's the very similar thing needed for Arrow. > > Will we support other formats like ORC file as well? If so, how to handle the > relationship similarly? Thanks. > > Regards, > Kai >
