I don't agree with this approach right now. Here are my reasons:
1. The Parquet Python integration will need to depend both on PyArrow
and the Arrow C++ libraries, so these libraries would generally need
to be developed together
2. PyArrow would need to define and maintain a C++ or Cython API so
that the equivalent of the current pyarrow.parquet library can access
C-level data. For example:
Cython does permit cross-project C API access (we are already doing
cross-module Cython APi access within pyarrow). This adds additional
complexity that I think we should avoid for now.
3. Maintaining a separate C++ build toolchain for a Python package
adds additional maintenance and packaging burden on us
My inclination is to keep the code where it is and make the Parquet
On Wed, Sep 21, 2016 at 10:16 AM, Uwe Korn <uw...@xhochy.com> wrote:
> as we have moved the Arrow<->Parquet C++ integration into parquet-cpp, we
> still have to decide on how we are going to proceed with the Arrow<->Parquet
> Python integration. For the moment, it seems that the best way to go ahead
> is to pull the pyarrow.parquet module out into a separate Python package.
> From an organisational point, I'm unclear how I should proceed here. Should
> we put this in a separate repo? If so, as part of the Apache organisation?