[Bug 159078] Support for Apache Parquet input for Calc and Base

bugzilla-daemon Wed, 10 Jan 2024 17:41:12 -0800

https://bugs.documentfoundation.org/show_bug.cgi?id=159078


--- Comment #13 from Kohei Yoshida <ko...@libreoffice.org> ---
Allow me to give you guys some clarification...

In the current state on the master branch, the internal orcus is built without
the parquet filter support.  The change referenced by the commit only
introduces all necessary hooks to enable Parquet support when orcus is built
with the parquet filter enabled, but that commit itself is not adequate to load
parquet files.

Now, to enable parquet filter in orcus, you first need to build the Apache
Arrow library since that becomes orcus's new dependency.  And to build the
Apache Arrow library, you need to build the libraries that the Arrow library
itself depends on.  Depending on how many features of Parquet you want to
enable (Parquet can support multiple compression algorithms), you may need to
build a few extra libraries or even more.  So, even in a minimal configuration,
we are talking about 3-4 extra libraries that need to be built before we can
turn on the parquet filter support in orcus.

Here is the main obstacle.  Most of these libraries use CMake as their only
build system.  So if we want to build all of them as part of the regular TDF
build, we first need to find a way to either integrate CMake support into our
GNU Make based build system, or somehow have them built outside of our core
build system and only reference them (or something).

Unfortunately I was not able to come up with a good solution for integrating
these libraries, which is the reason why the internal orcus is built without
parquet support at the moment...

Having said that, if someone wants to experiment with this, the easiest way to
enable Parquet support is to build orcus outside of the libreoffice build along
with all of its parquet related dependencies, and use --with-system-orcus to
treat it as a system-provided orcus library when building libreoffice.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 159078] Support for Apache Parquet input for Calc and Base

Reply via email to