yangxk1 opened a new issue, #836:
URL: https://github.com/apache/incubator-graphar/issues/836
### Describe the enhancement requested
### Description
As discussed in issue #1, the system-level `libarrow.so` provided in
standard manylinux environments (or installed via system package managers) is
often incomplete or lacks necessary components for our use case.
A more robust solution is to link `graphar.so` directly against the
`libarrow.so` bundled within the `pyarrow` python package. This ensures we are
using a full-featured Arrow library that matches the Python environment.
However, adopting this approach introduces several significant build and
runtime challenges described below.
The dependency relationship is illustrated as follows:
```mermaid
graph TD
A[pyarrow bundled libarrow.so] --> B[pyarrow.whl]
A --> C[graphar.so <br> C++ Core]
C --> D[graphar.whl <br> Python Binding]
B -.-> D
style A fill:#f9f,stroke:#333,stroke-width:2px
style D fill:#bbf,stroke:#333,stroke-width:2px
```
### Key Challenges
#### 1. ABI Compatibility (The "Segfault" Risk)
C++ ABI (Application Binary Interface) is not guaranteed to be stable across
different major versions of Apache Arrow.
* **Risk:** If `graphar.so` is built against the `libarrow.so` from
`pyarrow` v14.0.0, but the user updates to `pyarrow` v15.0.0 at runtime,
changes in class memory layouts or function signatures could cause immediate
Segmentation Faults.
* **Difficulty:** We need to determine a strategy to manage version
constraints effectively, ensuring the build-time Arrow version is
ABI-compatible with the runtime Arrow version.
#### 2. Runtime Linkage (RPATH Resolution)
Unlike system libraries located in `/usr/lib`, the target `libarrow.so`
resides deep within the python `site-packages/pyarrow` directory.
* **Challenge:** Standard linkers will not find this library by default.
`graphar.so` must be configured (likely via RPATH) to dynamically locate
`libarrow.so` relative to its own location at runtime, without forcing users to
manually manipulate `LD_LIBRARY_PATH`.
#### 3. The "Two Arrows" Problem (ODR Violation)
If this linking is not handled correctly (e.g., if GraphAr accidentally
links to a static Arrow or a different system Arrow), we risk having two
different copies of Arrow code in the process memory.
* **Consequence:** This would violate the One Definition Rule (ODR).
Passing objects (like `pyarrow.Table`) between GraphAr and PyArrow would lead
to undefined behavior, data corruption, or crashes.
### Objective
We need to design a build strategy that successfully links against the
`pyarrow`-bundled libraries while solving the RPATH and ABI compatibility
issues.
### Component(s)
Python, Developer Tools
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]