On Fri, May 17, 2024, at 10:36 AM, Antoine Pitrou wrote:
> Hi Julien,
>
> On Thu, 16 May 2024 18:23:33 -0700
> Julien Le Dem <jul...@apache.org> wrote:
>> 
>> As discussed, that code was moved in the arrow repo for convenience:
>> https://lists.apache.org/thread/gkvbm6yyly1r4cg3f6xtnqkjz6ogn6o2
>> 
>> To take an excerpt of that original decision:
>> 
>> 4) The Parquet and Arrow C++ communities will collaborate to provide
>> development workflows to enable contributors working exclusively on the
>> Parquet core functionality to be able to work unencumbered with unnecessary
>> build or test dependencies from the rest of the Arrow codebase. Note that
>> parquet-cpp already builds a significant portion of Apache Arrow en route
>> to creating its libraries 5) The Parquet community can create scripts to
>> "cut" Parquet C++ releases by packaging up the appropriate components and
>> ensuring that they can be built and installed independently as now
>
> Unfortunately, these two points haven't happened at all. On the
> contrary, the Arrow C++ dependency has infused much deeper in Parquet
> C++ (I was not there at the beginning of Parquet C++, but I get the
> impression there was originally an effort to have a Arrow-independent
> Parquet C++ core; that "core" doesn't exist anymore).

As an example, we had in the beginning separate I/O primitives in Arrow and 
Parquet. But during the further development, we realised that we were 
implementing exactly the same code paths only in different namespaces.

There are some core "utilities" hidden in Arrow that are required to build any 
modern C++based data processing library. If you would separate that into its 
own repository would enable parquet-cpp to be separated more easily. But given 
that the development around this is still very active in Arrow, it would bring 
a massive slowdown to the overall project.

Best
Uwe

Reply via email to