hi Kai,

Arrow C++ does not strictly depend on Parquet C++. I have been working
on parquet-cpp (http://github.com/apache/parquet-cpp) and intend to
create an optional Parquet-Arrow adapter library that links to
libparquet.so and provides a read and write path for Arrow data. It's
not 100% clear whether putting the adapter library in
apache/parquet-cpp or apache/arrow makes more sense, if others have
opinions.

The downside of putting the Arrow-Parquet C++ adapter code in
apache/arrow is continuous integration -- building Thrift and the
other parquet-cpp dependencies to run the unit tests might become
onerous. That being said, I *do* need to be able to run unit tests for
the PyArrow Parquet read/write path. I'm starting work on this in the
next few days, in fact.

- Wes

On Mon, Mar 21, 2016 at 8:57 AM, Zheng, Kai <[email protected]> wrote:
> Hi,
>
> By quick looking at the codes, it looks like Arrow is depending on Parquet, 
> however Parquet looks kinds of heavy for Arrow. Not sure what's the exact 
> part in Parquet Arrow is using. Not sure if the vice versa is better or not, 
> say in Parquet project, have a new reader that reads parquet data into an 
> Arrow representation, let Parquet depend on Arrow instead. I noticed there 
> was some effort (PARQUET-131) that reads parquet data into column vectors, 
> wonder if it's the very similar thing needed for Arrow.
>
> Will we support other formats like ORC file as well? If so, how to handle the 
> relationship similarly? Thanks.
>
> Regards,
> Kai
>

Reply via email to