(tvm-ffi) branch main updated: [Docs][Examples] Add CUDA C++ loading example and improve development documentation (#209)

tqchen Fri, 31 Oct 2025 15:50:01 -0700

This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-ffi.git



The following commit(s) were added to refs/heads/main by this push:
     new 997f61c  [Docs][Examples] Add CUDA C++ loading example and improve 
development documentation (#209)
997f61c is described below

commit 997f61c9a88b1afcd2fff719f28d3418485409fd
Author: yifeifang <[email protected]>
AuthorDate: Fri Oct 31 15:49:51 2025 -0700

    [Docs][Examples] Add CUDA C++ loading example and improve development 
documentation (#209)
    
    This PR enhances the documentation and examples for C++ developers,
    particularly focusing on CUDA usage and development environment
      setup.
    
      ### Changes
    
      #### New Examples
    - **Add `examples/quickstart/load/load_cuda.cc`**: New C++ example
    demonstrating how to load and execute CUDA FFI functions using custom
      CUDA memory allocators
    - **Update `examples/quickstart/raw_compile.sh`**: Add Example 4 showing
    compilation and execution of the CUDA loading example with
      proper CUDA include/library paths
    
      #### Documentation Improvements
    - **`docs/guides/cpp_guide.md`**: Add comprehensive "Distribution and
    ABI Compatibility" section
        - Explains glibc versioning challenges for kernel distributors
    - Documents the manylinux approach for cross-platform binary
    distribution
    - Provides practical guidance for building compatible C++ and CUDA
    libraries
        - Includes verification commands and recommended Docker images
        - Addresses the producer/consumer perspective on ABI compatibility
    
      - **`CONTRIBUTING.md`**: Add "Setting Up Pre-commit Hooks" section
    - Installation instructions with minimum version requirements
    (pre-commit 2.18.0+)
        - Complete list of all pre-commit hooks with exact versions
        - Troubleshooting guide for common issues
        - Documents all hook dependencies and configuration details
    
      #### Infrastructure
    - **`tests/docker/Dockerfile`**: Upgrade pip to latest version in
    development container
    
      ### Motivation
    
    1. **CUDA Examples**: The existing examples showed Python CUDA usage
    (`load_cupy.py`, `load_pytorch.py`) and CPU C++ usage
    (`load_cpp.cc`), but lacked a pure C++ CUDA example. This addition
    provides a complete picture of FFI usage patterns.
    
    2. **ABI Documentation**: Based on discussions about glibc compatibility
    issues when distributing kernels, this documentation
    helps kernel authors understand and avoid common pitfalls when building
    cross-platform binaries. **The original issue encountered is when
    building FFI from source with a newer glibc which caused FFI symbol has
    dependency on certain new version of glibc**
    
    3. **Development Setup**: Contributors were encountering pre-commit
    version issues (2.17.0 vs 2.18.0+). This documentation prevents such
    issues and provides a comprehensive reference for all tooling
    requirements.
    
      ### Testing
    
      - All pre-commit hooks pass
    - CUDA example compiles successfully in the Docker development
    environment
      - Documentation follows markdown linting rules
    
    ---------
    
    Co-authored-by: Yifei Fang <[email protected]>
---
 CONTRIBUTING.md                       |  84 +++++++++++++++++++++++--
 docs/guides/build_from_source.md      |   2 +-
 docs/guides/cpp_packaging.md          |  90 +++++++++++++++++++++++++++
 docs/index.rst                        |   1 +
 examples/quickstart/load/load_cuda.cc | 112 ++++++++++++++++++++++++++++++++++
 examples/quickstart/raw_compile.sh    |  22 +++++++
 tests/docker/Dockerfile               |   3 +
 7 files changed, 309 insertions(+), 5 deletions(-)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index e62a1d6..12d5edc 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -31,6 +31,81 @@ pip install --no-build-isolation -e . -v
 ```
 
 We recommend using the `--no-build-isolation` flag to ensure compatibility 
with your existing environment.
+
+## Setting Up Pre-commit Hooks
+
+This project uses [pre-commit](https://pre-commit.com/) to maintain code 
quality and consistency.
+Pre-commit hooks automatically check your code for common issues before you 
commit changes.
+
+### Installing Pre-commit
+
+First, install pre-commit (requires version 2.18.0 or later):
+
+```bash
+pip install pre-commit
+```
+
+### Installing the Git Hooks
+
+After cloning the repository, install the pre-commit hooks:
+
+```bash
+cd tvm-ffi
+pre-commit install
+```
+
+This configures git to automatically run the hooks before each commit.
+
+### Running Hooks Manually
+
+You can run the hooks manually on all files:
+
+```bash
+pre-commit run --all-files
+```
+
+Or run them only on staged files:
+
+```bash
+pre-commit run
+```
+
+### What the Hooks Check
+
+The pre-commit configuration includes checks for:
+
+- **License headers**: Ensures all files have proper Apache Software 
Foundation headers
+- **Code formatting**: Runs clang-format (C++), ruff (Python), shfmt (Shell 
scripts)
+- **Linting**: Runs clang-tidy, ruff, shellcheck, markdownlint, yamllint, and 
more
+- **Type checking**: Runs mypy for Python type annotations
+- **File quality**: Checks for trailing whitespace, file sizes, merge 
conflicts, etc.
+
+### Troubleshooting
+
+If you encounter errors:
+
+1. **Version issues**: Ensure you have pre-commit 2.18.0 or later:
+
+   ```bash
+   pre-commit --version
+   pip install --upgrade pre-commit
+   ```
+
+2. **Cache issues**: Clean the pre-commit cache:
+
+   ```bash
+   pre-commit clean
+   ```
+
+3. **Hook failures**: Most formatting hooks will automatically fix issues. 
Review the changes and stage them:
+
+   ```bash
+   git add -u
+   git commit
+   ```
+
+## Contributing Workflow
+
 You can contribute to the repo through the following steps.
 
 - Fork the repository and create a new branch for your work.
@@ -72,14 +147,15 @@ Inside the container you can install the project in 
editable mode and run the qu
 start example exactly as described in `examples/quick_start/README.md`:
 
 ```bash
-# In /workspace/tvm-ffi/
-pip install -ve .
+# In /workspace/tvm-ffi/ see 
https://tvm.apache.org/ffi/guides/build_from_source.html for reference
+pip install --force-reinstall --verbose -e . \
+  --config-settings cmake.define.TVM_FFI_ATTACH_DEBUG_SYMBOLS=ON
 
 # Change working directory to sample
-cd examples/quick_start
+cd examples/quickstart
 
 # Install dependency, Build and run all examples
-bash run_example.sh
+bash raw_compile.sh
 ```
 
 All build artifacts are written to the mounted workspace on the host machine, 
so you
diff --git a/docs/guides/build_from_source.md b/docs/guides/build_from_source.md
index ab46e50..f5ba145 100644
--- a/docs/guides/build_from_source.md
+++ b/docs/guides/build_from_source.md
@@ -51,7 +51,7 @@ Always clone with ``--recursive`` to pull submodules. If you 
already cloned with
 Follow the instruction below to build the Python package with 
scikit-build-core, which drives CMake to compile the C++ core and Cython 
extension.
 
 ```bash
-pip install --reinstall --verbose -e . \
+pip install --force-reinstall --verbose -e . \
   --config-settings cmake.define.TVM_FFI_ATTACH_DEBUG_SYMBOLS=ON
 ```
 
diff --git a/docs/guides/cpp_packaging.md b/docs/guides/cpp_packaging.md
new file mode 100644
index 0000000..585536f
--- /dev/null
+++ b/docs/guides/cpp_packaging.md
@@ -0,0 +1,90 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+# C++ Packaging and Distribution
+
+This guide explains how to package and distribute C++ libraries that use 
tvm-ffi, with a focus on ABI compatibility and cross-platform distribution.
+
+## Distribution and ABI Compatibility
+
+When distributing kernels or libraries that use tvm-ffi, it's important to 
understand the ABI compatibility challenges that arise from glibc versioning. 
This section provides guidance for kernel authors and library distributors.
+
+### Understanding the ABI Challenge
+
+While tvm-ffi uses a C ABI at the interface level (through DLTensor, 
TVMFFISafeCallType, etc.), the tvm-ffi library itself is written in C++ and 
depends on specific versions of glibc and the C++ standard library. This 
creates potential compatibility issues from two perspectives:
+
+**Consumer Perspective:**
+Applications that link against `libtvm_ffi.so` must use a compatible glibc 
version. If the glibc version mismatches, STL and glibc function symbols may be 
incompatible, leading to runtime errors or undefined behavior.
+
+**Producer/Kernel Distributor Perspective:**
+Even when kernel authors expose their functionality through the tvm-ffi 
interface (which solves cross-framework ABI issues like tensor representation), 
if their compiled `kernel.so` shared library contains ANY glibc or tvm_ffi 
symbols, consumers with different glibc versions may encounter undefined symbol 
errors at load time.
+
+### The manylinux Solution
+
+The recommended solution is to use the 
[manylinux](https://github.com/pypa/manylinux) approach, which is the standard 
way Python packages handle cross-platform binary distribution. The key 
principle is to build on an old glibc version and run on newer versions.
+
+Since glibc maintains forward compatibility (mostly), libraries built against 
an older glibc version will work on systems with newer glibc versions. The 
`apache-tvm-ffi` Python wheel is already built using manylinux-compatible 
environments.
+
+### Practical Guidance for Kernel Distributors
+
+#### For Pure C++ Library Distribution
+
+If you're distributing C++ libraries or CUDA kernels:
+
+1. **Use a Docker image with an old glibc version** for building:
+
+   ```bash
+   # See CONTRIBUTING.md for pre-built Docker images
+   # Or use manylinux Docker images as a base
+   docker pull quay.io/pypa/manylinux2014_x86_64
+   ```
+
+2. **For CUDA kernels**, ensure both your host launching code and the kernel 
are built in this environment:
+
+   ```bash
+   # Inside the container
+   nvcc -shared -Xcompiler -fPIC your_cuda_kernel.cu -o kernel.so \
+       $(tvm-ffi-config --cxxflags) \
+       $(tvm-ffi-config --ldflags) \
+       $(tvm-ffi-config --libs)
+   ```
+
+3. **Link against manylinux-compatible tvm_ffi.so**: Use the tvm-ffi library 
from the `apache-tvm-ffi` wheel, which is already manylinux-compatible.
+
+#### Build System Considerations
+
+- **Containerized builds**: Many C++ projects use containerized build systems. 
Adapt your existing Docker setup to use manylinux base images or images with 
older glibc versions.
+- **CI/CD pipelines**: Configure your continuous integration to build in 
manylinux environments. GitHub Actions and other CI services support 
Docker-based builds.
+- **Testing**: Always test your distributed binaries on multiple Linux 
distributions to verify compatibility.
+
+### Verification
+
+To check the glibc version your binary depends on:
+
+```bash
+objdump -T your_kernel.so | grep GLIBC_
+```
+
+This shows the minimum glibc version required. Ensure it's compatible with 
your target deployment environments.
+
+### Summary
+
+- **Build on old glibc** (via manylinux or old Linux distributions)
+- **Run on new glibc** (forward compatibility guaranteed)
+- **Use containerized builds** for reproducible environments
+- **Test across distributions** to verify compatibility
+
+For more details on setting up development environments, see `CONTRIBUTING.md`.
diff --git a/docs/index.rst b/docs/index.rst
index 14ec2ee..fd859a7 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -47,6 +47,7 @@ Table of Contents
    :caption: Guides
 
    guides/python_packaging.md
+   guides/cpp_packaging.md
    guides/cpp_guide.md
    guides/python_guide.md
    guides/rust_guide.md
diff --git a/examples/quickstart/load/load_cuda.cc 
b/examples/quickstart/load/load_cuda.cc
new file mode 100644
index 0000000..8e2e55f
--- /dev/null
+++ b/examples/quickstart/load/load_cuda.cc
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+// [main.begin]
+// File: load/load_cuda.cc
+#include <tvm/ffi/container/tensor.h>
+#include <tvm/ffi/extra/module.h>
+
+namespace {
+namespace ffi = tvm::ffi;
+/*!
+ * \brief Main logics of library loading and function calling with CUDA 
tensors.
+ * \param x The input tensor on CUDA device.
+ * \param y The output tensor on CUDA device.
+ */
+void Run(tvm::ffi::TensorView x, tvm::ffi::TensorView y) {
+  // Load shared library `build/add_one_cuda.so`
+  ffi::Module mod = ffi::Module::LoadFromFile("build/add_one_cuda.so");
+  // Look up `add_one_cuda` function
+  ffi::Function add_one_cuda = mod->GetFunction("add_one_cuda").value();
+  // Call the function with CUDA tensors
+  add_one_cuda(x, y);
+}
+}  // namespace
+// [main.end]
+/************* Auxiliary Logics *************/
+// [aux.begin]
+#include <cuda_runtime.h>
+#include <tvm/ffi/error.h>
+
+#include <iostream>
+#include <vector>
+
+struct CUDANDAlloc {
+  void AllocData(DLTensor* tensor) {
+    size_t data_size = ffi::GetDataSize(*tensor);
+    void* ptr = nullptr;
+    cudaError_t err = cudaMalloc(&ptr, data_size);
+    TVM_FFI_ICHECK_EQ(err, cudaSuccess) << "cudaMalloc failed: " << 
cudaGetErrorString(err);
+    tensor->data = ptr;
+  }
+
+  void FreeData(DLTensor* tensor) {
+    if (tensor->data != nullptr) {
+      cudaError_t err = cudaFree(tensor->data);
+      TVM_FFI_ICHECK_EQ(err, cudaSuccess) << "cudaFree failed: " << 
cudaGetErrorString(err);
+      tensor->data = nullptr;
+    }
+  }
+};
+
+/*!
+ * \brief Allocate a CUDA tensor with the given shape and data type.
+ * \param shape The shape of the tensor.
+ * \param dtype The data type of the tensor.
+ * \param device The CUDA device.
+ * \return The allocated CUDA tensor.
+ */
+inline ffi::Tensor Empty(ffi::Shape shape, DLDataType dtype, DLDevice device) {
+  return ffi::Tensor::FromNDAlloc(CUDANDAlloc(), shape, dtype, device);
+}
+
+int main() {
+  DLDataType f32_dtype{kDLFloat, 32, 1};
+  DLDevice cuda_device{kDLCUDA, 0};
+
+  constexpr int ARRAY_SIZE = 5;
+
+  ffi::Tensor x = Empty({ARRAY_SIZE}, f32_dtype, cuda_device);
+  ffi::Tensor y = Empty({ARRAY_SIZE}, f32_dtype, cuda_device);
+
+  std::vector<float> host_x(ARRAY_SIZE);
+  for (int i = 0; i < ARRAY_SIZE; ++i) {
+    host_x[i] = static_cast<float>(i + 1);
+  }
+
+  size_t nbytes = host_x.size() * sizeof(float);
+  cudaError_t err = cudaMemcpy(x.data_ptr(), host_x.data(), nbytes, 
cudaMemcpyHostToDevice);
+  TVM_FFI_ICHECK_EQ(err, cudaSuccess)
+      << "cudaMemcpy host to device failed: " << cudaGetErrorString(err);
+
+  Run(x, y);
+
+  std::vector<float> host_y(host_x.size());
+  err = cudaMemcpy(host_y.data(), y.data_ptr(), nbytes, 
cudaMemcpyDeviceToHost);
+  TVM_FFI_ICHECK_EQ(err, cudaSuccess)
+      << "cudaMemcpy device to host failed: " << cudaGetErrorString(err);
+
+  std::cout << "[ ";
+  for (float value : host_y) {
+    std::cout << value << " ";
+  }
+  std::cout << "]" << std::endl;
+
+  return 0;
+}
+// [aux.end]
diff --git a/examples/quickstart/raw_compile.sh 
b/examples/quickstart/raw_compile.sh
index d0fcbcc..5a44dc5 100755
--- a/examples/quickstart/raw_compile.sh
+++ b/examples/quickstart/raw_compile.sh
@@ -59,3 +59,25 @@ g++ -fvisibility=hidden -O3                 \
 build/load_cpp
 # [load_cpp.end]
 fi
+
+# Example 4. Load and run `add_one_cuda.so` in C++
+# Before run this example, make sure you have a CUDA-capable GPU and the CUDA 
toolkit installed.
+# See CONTRIBUTING.md to use a pre-built Docker image with CUDA support.
+
+# if [ -f "$BUILD_DIR/add_one_cuda.so" ] && command -v nvcc >/dev/null 2>&1; 
then
+# # [load_cuda.begin]
+# g++ -fvisibility=hidden -O3                 \
+#     load/load_cuda.cc                       \
+#     $(tvm-ffi-config --cxxflags)            \
+#     $(tvm-ffi-config --ldflags)             \
+#     $(tvm-ffi-config --libs)                \
+#     -I/usr/local/cuda/include               \
+#     -L/usr/local/cuda/lib64                 \
+#     -lcudart                                \
+#     -Wl,-rpath,$(tvm-ffi-config --libdir)   \
+#     -Wl,-rpath,/usr/local/cuda/lib64        \
+#     -o build/load_cuda
+
+# build/load_cuda
+# # [load_cuda.end]
+# fi
diff --git a/tests/docker/Dockerfile b/tests/docker/Dockerfile
index 1fd6e66..10b3f06 100644
--- a/tests/docker/Dockerfile
+++ b/tests/docker/Dockerfile
@@ -62,6 +62,9 @@ RUN apt-get update \
         zip \
     && rm -rf /var/lib/apt/lists/*
 
+# Upgrade pip to the latest version
+RUN python3 -m pip install --upgrade pip
+
 # Provide a working directory for the project
 WORKDIR /workspace

(tvm-ffi) branch main updated: [Docs][Examples] Add CUDA C++ loading example and improve development documentation (#209)

Reply via email to