This is an automated email from the ASF dual-hosted git repository.
tqchen pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-ffi.git
The following commit(s) were added to refs/heads/main by this push:
new 997f61c [Docs][Examples] Add CUDA C++ loading example and improve
development documentation (#209)
997f61c is described below
commit 997f61c9a88b1afcd2fff719f28d3418485409fd
Author: yifeifang <[email protected]>
AuthorDate: Fri Oct 31 15:49:51 2025 -0700
[Docs][Examples] Add CUDA C++ loading example and improve development
documentation (#209)
This PR enhances the documentation and examples for C++ developers,
particularly focusing on CUDA usage and development environment
setup.
### Changes
#### New Examples
- **Add `examples/quickstart/load/load_cuda.cc`**: New C++ example
demonstrating how to load and execute CUDA FFI functions using custom
CUDA memory allocators
- **Update `examples/quickstart/raw_compile.sh`**: Add Example 4 showing
compilation and execution of the CUDA loading example with
proper CUDA include/library paths
#### Documentation Improvements
- **`docs/guides/cpp_guide.md`**: Add comprehensive "Distribution and
ABI Compatibility" section
- Explains glibc versioning challenges for kernel distributors
- Documents the manylinux approach for cross-platform binary
distribution
- Provides practical guidance for building compatible C++ and CUDA
libraries
- Includes verification commands and recommended Docker images
- Addresses the producer/consumer perspective on ABI compatibility
- **`CONTRIBUTING.md`**: Add "Setting Up Pre-commit Hooks" section
- Installation instructions with minimum version requirements
(pre-commit 2.18.0+)
- Complete list of all pre-commit hooks with exact versions
- Troubleshooting guide for common issues
- Documents all hook dependencies and configuration details
#### Infrastructure
- **`tests/docker/Dockerfile`**: Upgrade pip to latest version in
development container
### Motivation
1. **CUDA Examples**: The existing examples showed Python CUDA usage
(`load_cupy.py`, `load_pytorch.py`) and CPU C++ usage
(`load_cpp.cc`), but lacked a pure C++ CUDA example. This addition
provides a complete picture of FFI usage patterns.
2. **ABI Documentation**: Based on discussions about glibc compatibility
issues when distributing kernels, this documentation
helps kernel authors understand and avoid common pitfalls when building
cross-platform binaries. **The original issue encountered is when
building FFI from source with a newer glibc which caused FFI symbol has
dependency on certain new version of glibc**
3. **Development Setup**: Contributors were encountering pre-commit
version issues (2.17.0 vs 2.18.0+). This documentation prevents such
issues and provides a comprehensive reference for all tooling
requirements.
### Testing
- All pre-commit hooks pass
- CUDA example compiles successfully in the Docker development
environment
- Documentation follows markdown linting rules
---------
Co-authored-by: Yifei Fang <[email protected]>
---
CONTRIBUTING.md | 84 +++++++++++++++++++++++--
docs/guides/build_from_source.md | 2 +-
docs/guides/cpp_packaging.md | 90 +++++++++++++++++++++++++++
docs/index.rst | 1 +
examples/quickstart/load/load_cuda.cc | 112 ++++++++++++++++++++++++++++++++++
examples/quickstart/raw_compile.sh | 22 +++++++
tests/docker/Dockerfile | 3 +
7 files changed, 309 insertions(+), 5 deletions(-)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index e62a1d6..12d5edc 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -31,6 +31,81 @@ pip install --no-build-isolation -e . -v
```
We recommend using the `--no-build-isolation` flag to ensure compatibility
with your existing environment.
+
+## Setting Up Pre-commit Hooks
+
+This project uses [pre-commit](https://pre-commit.com/) to maintain code
quality and consistency.
+Pre-commit hooks automatically check your code for common issues before you
commit changes.
+
+### Installing Pre-commit
+
+First, install pre-commit (requires version 2.18.0 or later):
+
+```bash
+pip install pre-commit
+```
+
+### Installing the Git Hooks
+
+After cloning the repository, install the pre-commit hooks:
+
+```bash
+cd tvm-ffi
+pre-commit install
+```
+
+This configures git to automatically run the hooks before each commit.
+
+### Running Hooks Manually
+
+You can run the hooks manually on all files:
+
+```bash
+pre-commit run --all-files
+```
+
+Or run them only on staged files:
+
+```bash
+pre-commit run
+```
+
+### What the Hooks Check
+
+The pre-commit configuration includes checks for:
+
+- **License headers**: Ensures all files have proper Apache Software
Foundation headers
+- **Code formatting**: Runs clang-format (C++), ruff (Python), shfmt (Shell
scripts)
+- **Linting**: Runs clang-tidy, ruff, shellcheck, markdownlint, yamllint, and
more
+- **Type checking**: Runs mypy for Python type annotations
+- **File quality**: Checks for trailing whitespace, file sizes, merge
conflicts, etc.
+
+### Troubleshooting
+
+If you encounter errors:
+
+1. **Version issues**: Ensure you have pre-commit 2.18.0 or later:
+
+ ```bash
+ pre-commit --version
+ pip install --upgrade pre-commit
+ ```
+
+2. **Cache issues**: Clean the pre-commit cache:
+
+ ```bash
+ pre-commit clean
+ ```
+
+3. **Hook failures**: Most formatting hooks will automatically fix issues.
Review the changes and stage them:
+
+ ```bash
+ git add -u
+ git commit
+ ```
+
+## Contributing Workflow
+
You can contribute to the repo through the following steps.
- Fork the repository and create a new branch for your work.
@@ -72,14 +147,15 @@ Inside the container you can install the project in
editable mode and run the qu
start example exactly as described in `examples/quick_start/README.md`:
```bash
-# In /workspace/tvm-ffi/
-pip install -ve .
+# In /workspace/tvm-ffi/ see
https://tvm.apache.org/ffi/guides/build_from_source.html for reference
+pip install --force-reinstall --verbose -e . \
+ --config-settings cmake.define.TVM_FFI_ATTACH_DEBUG_SYMBOLS=ON
# Change working directory to sample
-cd examples/quick_start
+cd examples/quickstart
# Install dependency, Build and run all examples
-bash run_example.sh
+bash raw_compile.sh
```
All build artifacts are written to the mounted workspace on the host machine,
so you
diff --git a/docs/guides/build_from_source.md b/docs/guides/build_from_source.md
index ab46e50..f5ba145 100644
--- a/docs/guides/build_from_source.md
+++ b/docs/guides/build_from_source.md
@@ -51,7 +51,7 @@ Always clone with ``--recursive`` to pull submodules. If you
already cloned with
Follow the instruction below to build the Python package with
scikit-build-core, which drives CMake to compile the C++ core and Cython
extension.
```bash
-pip install --reinstall --verbose -e . \
+pip install --force-reinstall --verbose -e . \
--config-settings cmake.define.TVM_FFI_ATTACH_DEBUG_SYMBOLS=ON
```
diff --git a/docs/guides/cpp_packaging.md b/docs/guides/cpp_packaging.md
new file mode 100644
index 0000000..585536f
--- /dev/null
+++ b/docs/guides/cpp_packaging.md
@@ -0,0 +1,90 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements. See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership. The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License. You may obtain a copy of the License at -->
+
+<!--- http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied. See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+# C++ Packaging and Distribution
+
+This guide explains how to package and distribute C++ libraries that use
tvm-ffi, with a focus on ABI compatibility and cross-platform distribution.
+
+## Distribution and ABI Compatibility
+
+When distributing kernels or libraries that use tvm-ffi, it's important to
understand the ABI compatibility challenges that arise from glibc versioning.
This section provides guidance for kernel authors and library distributors.
+
+### Understanding the ABI Challenge
+
+While tvm-ffi uses a C ABI at the interface level (through DLTensor,
TVMFFISafeCallType, etc.), the tvm-ffi library itself is written in C++ and
depends on specific versions of glibc and the C++ standard library. This
creates potential compatibility issues from two perspectives:
+
+**Consumer Perspective:**
+Applications that link against `libtvm_ffi.so` must use a compatible glibc
version. If the glibc version mismatches, STL and glibc function symbols may be
incompatible, leading to runtime errors or undefined behavior.
+
+**Producer/Kernel Distributor Perspective:**
+Even when kernel authors expose their functionality through the tvm-ffi
interface (which solves cross-framework ABI issues like tensor representation),
if their compiled `kernel.so` shared library contains ANY glibc or tvm_ffi
symbols, consumers with different glibc versions may encounter undefined symbol
errors at load time.
+
+### The manylinux Solution
+
+The recommended solution is to use the
[manylinux](https://github.com/pypa/manylinux) approach, which is the standard
way Python packages handle cross-platform binary distribution. The key
principle is to build on an old glibc version and run on newer versions.
+
+Since glibc maintains forward compatibility (mostly), libraries built against
an older glibc version will work on systems with newer glibc versions. The
`apache-tvm-ffi` Python wheel is already built using manylinux-compatible
environments.
+
+### Practical Guidance for Kernel Distributors
+
+#### For Pure C++ Library Distribution
+
+If you're distributing C++ libraries or CUDA kernels:
+
+1. **Use a Docker image with an old glibc version** for building:
+
+ ```bash
+ # See CONTRIBUTING.md for pre-built Docker images
+ # Or use manylinux Docker images as a base
+ docker pull quay.io/pypa/manylinux2014_x86_64
+ ```
+
+2. **For CUDA kernels**, ensure both your host launching code and the kernel
are built in this environment:
+
+ ```bash
+ # Inside the container
+ nvcc -shared -Xcompiler -fPIC your_cuda_kernel.cu -o kernel.so \
+ $(tvm-ffi-config --cxxflags) \
+ $(tvm-ffi-config --ldflags) \
+ $(tvm-ffi-config --libs)
+ ```
+
+3. **Link against manylinux-compatible tvm_ffi.so**: Use the tvm-ffi library
from the `apache-tvm-ffi` wheel, which is already manylinux-compatible.
+
+#### Build System Considerations
+
+- **Containerized builds**: Many C++ projects use containerized build systems.
Adapt your existing Docker setup to use manylinux base images or images with
older glibc versions.
+- **CI/CD pipelines**: Configure your continuous integration to build in
manylinux environments. GitHub Actions and other CI services support
Docker-based builds.
+- **Testing**: Always test your distributed binaries on multiple Linux
distributions to verify compatibility.
+
+### Verification
+
+To check the glibc version your binary depends on:
+
+```bash
+objdump -T your_kernel.so | grep GLIBC_
+```
+
+This shows the minimum glibc version required. Ensure it's compatible with
your target deployment environments.
+
+### Summary
+
+- **Build on old glibc** (via manylinux or old Linux distributions)
+- **Run on new glibc** (forward compatibility guaranteed)
+- **Use containerized builds** for reproducible environments
+- **Test across distributions** to verify compatibility
+
+For more details on setting up development environments, see `CONTRIBUTING.md`.
diff --git a/docs/index.rst b/docs/index.rst
index 14ec2ee..fd859a7 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -47,6 +47,7 @@ Table of Contents
:caption: Guides
guides/python_packaging.md
+ guides/cpp_packaging.md
guides/cpp_guide.md
guides/python_guide.md
guides/rust_guide.md
diff --git a/examples/quickstart/load/load_cuda.cc
b/examples/quickstart/load/load_cuda.cc
new file mode 100644
index 0000000..8e2e55f
--- /dev/null
+++ b/examples/quickstart/load/load_cuda.cc
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+// [main.begin]
+// File: load/load_cuda.cc
+#include <tvm/ffi/container/tensor.h>
+#include <tvm/ffi/extra/module.h>
+
+namespace {
+namespace ffi = tvm::ffi;
+/*!
+ * \brief Main logics of library loading and function calling with CUDA
tensors.
+ * \param x The input tensor on CUDA device.
+ * \param y The output tensor on CUDA device.
+ */
+void Run(tvm::ffi::TensorView x, tvm::ffi::TensorView y) {
+ // Load shared library `build/add_one_cuda.so`
+ ffi::Module mod = ffi::Module::LoadFromFile("build/add_one_cuda.so");
+ // Look up `add_one_cuda` function
+ ffi::Function add_one_cuda = mod->GetFunction("add_one_cuda").value();
+ // Call the function with CUDA tensors
+ add_one_cuda(x, y);
+}
+} // namespace
+// [main.end]
+/************* Auxiliary Logics *************/
+// [aux.begin]
+#include <cuda_runtime.h>
+#include <tvm/ffi/error.h>
+
+#include <iostream>
+#include <vector>
+
+struct CUDANDAlloc {
+ void AllocData(DLTensor* tensor) {
+ size_t data_size = ffi::GetDataSize(*tensor);
+ void* ptr = nullptr;
+ cudaError_t err = cudaMalloc(&ptr, data_size);
+ TVM_FFI_ICHECK_EQ(err, cudaSuccess) << "cudaMalloc failed: " <<
cudaGetErrorString(err);
+ tensor->data = ptr;
+ }
+
+ void FreeData(DLTensor* tensor) {
+ if (tensor->data != nullptr) {
+ cudaError_t err = cudaFree(tensor->data);
+ TVM_FFI_ICHECK_EQ(err, cudaSuccess) << "cudaFree failed: " <<
cudaGetErrorString(err);
+ tensor->data = nullptr;
+ }
+ }
+};
+
+/*!
+ * \brief Allocate a CUDA tensor with the given shape and data type.
+ * \param shape The shape of the tensor.
+ * \param dtype The data type of the tensor.
+ * \param device The CUDA device.
+ * \return The allocated CUDA tensor.
+ */
+inline ffi::Tensor Empty(ffi::Shape shape, DLDataType dtype, DLDevice device) {
+ return ffi::Tensor::FromNDAlloc(CUDANDAlloc(), shape, dtype, device);
+}
+
+int main() {
+ DLDataType f32_dtype{kDLFloat, 32, 1};
+ DLDevice cuda_device{kDLCUDA, 0};
+
+ constexpr int ARRAY_SIZE = 5;
+
+ ffi::Tensor x = Empty({ARRAY_SIZE}, f32_dtype, cuda_device);
+ ffi::Tensor y = Empty({ARRAY_SIZE}, f32_dtype, cuda_device);
+
+ std::vector<float> host_x(ARRAY_SIZE);
+ for (int i = 0; i < ARRAY_SIZE; ++i) {
+ host_x[i] = static_cast<float>(i + 1);
+ }
+
+ size_t nbytes = host_x.size() * sizeof(float);
+ cudaError_t err = cudaMemcpy(x.data_ptr(), host_x.data(), nbytes,
cudaMemcpyHostToDevice);
+ TVM_FFI_ICHECK_EQ(err, cudaSuccess)
+ << "cudaMemcpy host to device failed: " << cudaGetErrorString(err);
+
+ Run(x, y);
+
+ std::vector<float> host_y(host_x.size());
+ err = cudaMemcpy(host_y.data(), y.data_ptr(), nbytes,
cudaMemcpyDeviceToHost);
+ TVM_FFI_ICHECK_EQ(err, cudaSuccess)
+ << "cudaMemcpy device to host failed: " << cudaGetErrorString(err);
+
+ std::cout << "[ ";
+ for (float value : host_y) {
+ std::cout << value << " ";
+ }
+ std::cout << "]" << std::endl;
+
+ return 0;
+}
+// [aux.end]
diff --git a/examples/quickstart/raw_compile.sh
b/examples/quickstart/raw_compile.sh
index d0fcbcc..5a44dc5 100755
--- a/examples/quickstart/raw_compile.sh
+++ b/examples/quickstart/raw_compile.sh
@@ -59,3 +59,25 @@ g++ -fvisibility=hidden -O3 \
build/load_cpp
# [load_cpp.end]
fi
+
+# Example 4. Load and run `add_one_cuda.so` in C++
+# Before run this example, make sure you have a CUDA-capable GPU and the CUDA
toolkit installed.
+# See CONTRIBUTING.md to use a pre-built Docker image with CUDA support.
+
+# if [ -f "$BUILD_DIR/add_one_cuda.so" ] && command -v nvcc >/dev/null 2>&1;
then
+# # [load_cuda.begin]
+# g++ -fvisibility=hidden -O3 \
+# load/load_cuda.cc \
+# $(tvm-ffi-config --cxxflags) \
+# $(tvm-ffi-config --ldflags) \
+# $(tvm-ffi-config --libs) \
+# -I/usr/local/cuda/include \
+# -L/usr/local/cuda/lib64 \
+# -lcudart \
+# -Wl,-rpath,$(tvm-ffi-config --libdir) \
+# -Wl,-rpath,/usr/local/cuda/lib64 \
+# -o build/load_cuda
+
+# build/load_cuda
+# # [load_cuda.end]
+# fi
diff --git a/tests/docker/Dockerfile b/tests/docker/Dockerfile
index 1fd6e66..10b3f06 100644
--- a/tests/docker/Dockerfile
+++ b/tests/docker/Dockerfile
@@ -62,6 +62,9 @@ RUN apt-get update \
zip \
&& rm -rf /var/lib/apt/lists/*
+# Upgrade pip to the latest version
+RUN python3 -m pip install --upgrade pip
+
# Provide a working directory for the project
WORKDIR /workspace