Thanks Wes for the thoughts and sharing! Sounds good to me ...

Regards,
Kai

-----Original Message-----
From: Wes McKinney [mailto:[email protected]] 
Sent: Tuesday, March 22, 2016 1:20 AM
To: [email protected]
Subject: Re: How Arrow is related to Parquet

hi Kai,

Arrow C++ does not strictly depend on Parquet C++. I have been working on 
parquet-cpp (http://github.com/apache/parquet-cpp) and intend to create an 
optional Parquet-Arrow adapter library that links to libparquet.so and provides 
a read and write path for Arrow data. It's not 100% clear whether putting the 
adapter library in apache/parquet-cpp or apache/arrow makes more sense, if 
others have opinions.

The downside of putting the Arrow-Parquet C++ adapter code in apache/arrow is 
continuous integration -- building Thrift and the other parquet-cpp 
dependencies to run the unit tests might become onerous. That being said, I 
*do* need to be able to run unit tests for the PyArrow Parquet read/write path. 
I'm starting work on this in the next few days, in fact.

- Wes

On Mon, Mar 21, 2016 at 8:57 AM, Zheng, Kai <[email protected]> wrote:
> Hi,
>
> By quick looking at the codes, it looks like Arrow is depending on Parquet, 
> however Parquet looks kinds of heavy for Arrow. Not sure what's the exact 
> part in Parquet Arrow is using. Not sure if the vice versa is better or not, 
> say in Parquet project, have a new reader that reads parquet data into an 
> Arrow representation, let Parquet depend on Arrow instead. I noticed there 
> was some effort (PARQUET-131) that reads parquet data into column vectors, 
> wonder if it's the very similar thing needed for Arrow.
>
> Will we support other formats like ORC file as well? If so, how to handle the 
> relationship similarly? Thanks.
>
> Regards,
> Kai
>

Reply via email to