(tvm-ffi) branch main updated: Add Docker support and C++ CUDA example (#29)

tqchen Sat, 20 Sep 2025 12:55:34 -0700

This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-ffi.git



The following commit(s) were added to refs/heads/main by this push:
     new fc0f5f5  Add Docker support and C++ CUDA example (#29)
fc0f5f5 is described below

commit fc0f5f5e3295a974a4c52619c6535669f7a4d9ad
Author: yifeifang <[email protected]>
AuthorDate: Sat Sep 20 12:55:23 2025 -0700

    Add Docker support and C++ CUDA example (#29)
    
    Introduce a Dockerfile for setting up a development environment with
    improved GPU support. Include a C++ example demonstrating CUDA
    functionality and enhance documentation for quick start and build
    instructions. Update build scripts to support Ninja for faster builds.
---
 .pre-commit-config.yaml                      |  6 +-
 CONTRIBUTING.md                              | 41 ++++++++++++
 docs/get_started/quick_start.md              | 18 ++++--
 examples/quick_start/CMakeLists.txt          | 28 ++++++---
 examples/quick_start/README.md               | 43 +++++++++++--
 examples/quick_start/requirements.txt        |  3 +
 examples/quick_start/run_example.sh          | 18 +++++-
 examples/quick_start/src/run_example_cuda.cc | 94 ++++++++++++++++++++++++++++
 tests/docker/Dockerfile                      | 72 +++++++++++++++++++++
 tests/lint/check_file_type.py                |  1 +
 10 files changed, 301 insertions(+), 23 deletions(-)

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 8ea7c4a..8f762dc 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -27,7 +27,8 @@ repos:
       - id: check-asf-header
         name: check ASF Header
         entry: python tests/lint/check_asf_header.py --check
-        language: system
+        language: python
+        language_version: python3
         pass_filenames: false
         verbose: false
   - repo: local
@@ -35,7 +36,8 @@ repos:
       - id: check-file-type
         name: check file types
         entry: python tests/lint/check_file_type.py
-        language: system
+        language: python
+        language_version: python3
         pass_filenames: false
         verbose: false
   - repo: https://github.com/pre-commit/pre-commit-hooks
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index d89796b..e62a1d6 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -43,6 +43,47 @@ For significant changes, it's often a good idea to open a 
GitHub issue first (wi
 It is optional, but can be very helpful as it allows the maintainers and the 
community to provide feedback and helps ensure your
 work aligns with the project's goals.
 
+## Development with Docker
+
+The repository ships a development container that contains the full toolchain 
for
+building the core library, and running examples.
+
+```bash
+# Build the image (from the repository root)
+docker build -t tvm-ffi-dev -f tests/docker/Dockerfile tests/docker
+
+# Start an interactive shell
+docker run --rm -it \
+    -v "$(pwd)":/workspace/tvm-ffi \
+    -w /workspace/tvm-ffi \
+    tvm-ffi-dev bash
+
+# Start an interactive shell with GPU access
+docker run --rm -it --gpus all \
+    -v "$(pwd)":/workspace/tvm-ffi \
+    -w /workspace/tvm-ffi \
+    tvm-ffi-dev bash
+
+> **Note** Ensure the [NVIDIA Container 
Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
+> is installed on the host to make GPUs available inside the container.
+```
+
+Inside the container you can install the project in editable mode and run the 
quick
+start example exactly as described in `examples/quick_start/README.md`:
+
+```bash
+# In /workspace/tvm-ffi/
+pip install -ve .
+
+# Change working directory to sample
+cd examples/quick_start
+
+# Install dependency, Build and run all examples
+bash run_example.sh
+```
+
+All build artifacts are written to the mounted workspace on the host machine, 
so you
+can continue editing files with your local tooling.
 
 ## Stability and Minimalism
 
diff --git a/docs/get_started/quick_start.md b/docs/get_started/quick_start.md
index 043ea28..b422d10 100644
--- a/docs/get_started/quick_start.md
+++ b/docs/get_started/quick_start.md
@@ -33,6 +33,7 @@ Before starting, ensure you have:
 - TVM FFI installed following [installation](./install.md)
 - C++ compiler with C++17 support
 - CMake 3.18 or later
+- (Optional) Ninja build system (the quick-start uses Ninja for fast 
incremental builds)
 - (Optional) CUDA toolkit for GPU examples
 - (Optional) PyTorch for checking torch integrations
 
@@ -47,8 +48,10 @@ The examples are now in the example folder, you can quickly 
build
 the example using the following command.
 ```bash
 cd examples/quick_start
-cmake -B build -S .
-cmake --build build
+
+# with ninja or omit -G Ninja to use default generator
+cmake --fresh -G Ninja -B build -S .
+cmake --build build --parallel
 ```
 
 After the build finishes, you can run the python examples by
@@ -59,7 +62,13 @@ python run_example.py
 You can also run the c++ example
 
 ```
-./build/example
+./build/run_example
+```
+
+If the CUDA toolkit is available, the GPU demo binary is built alongside the 
CPU sample:
+
+```
+./build/run_example_cuda
 ```
 
 ## Walk through the Example
@@ -74,7 +83,8 @@ examples/quick_start/
 │   ├── add_one_cpu.cc      # CPU implementation
 │   ├── add_one_c.c         # A low-level C based implementation
 │   ├── add_one_cuda.cu     # CUDA implementation
-│   └── run_example.cc      # C++ usage example
+│   ├── run_example.cc      # C++ usage example
+│   └── run_example_cuda.cc # C++ with CUDA kernel usage example
 ├── run_example.py          # Python usage example
 ├── run_example.sh          # Build and run script
 └── CMakeLists.txt          # Build configuration
diff --git a/examples/quick_start/CMakeLists.txt 
b/examples/quick_start/CMakeLists.txt
index 0f6ea11..d3cd4b2 100644
--- a/examples/quick_start/CMakeLists.txt
+++ b/examples/quick_start/CMakeLists.txt
@@ -19,17 +19,17 @@ cmake_minimum_required(VERSION 3.18)
 project(tvm_ffi_example)
 
 
-# first find python related components
+# Discover the Python interpreter so we can query tvm-ffi for its CMake 
package path.
 find_package(Python COMPONENTS Interpreter REQUIRED)
 
-# call tvm_ffi.config to get the cmake directory and set it to tvm_ffi_ROOT
+# Ask tvm-ffi where it stores its exported CMake files.
 execute_process(
   COMMAND "${Python_EXECUTABLE}" -m tvm_ffi.config --cmakedir
   OUTPUT_STRIP_TRAILING_WHITESPACE OUTPUT_VARIABLE tvm_ffi_ROOT)
-# find package will automatically include the related projects
+# Pull in the tvm-ffi CMake targets so we can link against them below.
 find_package(tvm_ffi CONFIG REQUIRED)
 
-# use the projects as usual
+# Build the CPU and C versions of the simple "add one" function that the 
examples call.
 add_library(add_one_cpu SHARED src/add_one_cpu.cc)
 add_library(add_one_c SHARED src/add_one_c.c)
 target_link_libraries(add_one_cpu tvm_ffi_header)
@@ -47,23 +47,31 @@ set_target_properties(
   SUFFIX ".so"
 )
 
-# Check if CUDA is available
+# Optionally build the CUDA variant if the CUDA toolkit is present.
 if(NOT WIN32)
-  find_package(CUDA QUIET)
-  if(CUDA_FOUND)
+  find_package(CUDAToolkit QUIET)
+  if(CUDAToolkit_FOUND)
     enable_language(CUDA)
+
     add_library(add_one_cuda SHARED src/add_one_cuda.cu)
-    target_link_libraries(add_one_cuda tvm_ffi_shared)
+    target_link_libraries(add_one_cuda PRIVATE tvm_ffi_shared)
 
-    # show as add_one_cuda.so
     set_target_properties(
       add_one_cuda PROPERTIES
       PREFIX ""
       SUFFIX ".so"
-      )
+    )
+
+    add_executable(run_example_cuda src/run_example_cuda.cc)
+    set_target_properties(
+      run_example_cuda PROPERTIES
+      CXX_STANDARD 17
+    )
+    target_link_libraries(run_example_cuda PRIVATE tvm_ffi_shared CUDA::cudart)
   endif()
 endif()
 
+# CPU-only C++ driver used in the quick start guide.
 add_executable(run_example src/run_example.cc)
 set_target_properties(
   run_example PROPERTIES
diff --git a/examples/quick_start/README.md b/examples/quick_start/README.md
index d4d130e..0e03133 100644
--- a/examples/quick_start/README.md
+++ b/examples/quick_start/README.md
@@ -23,16 +23,46 @@ that can be loaded in different environments.
 The example implements a simple "add one" operation that adds 1 to each element
 of an input tensor, showing how to create C++ functions callable from Python.
 
-You can run this quick start example by:
+## Prerequisites
+
+Before running the quick start, ensure you have:
+
+- tvm-ffi installed locally (editable installs are convenient while iterating):
+- Installation guide: [Installation 
guide](https://tvm.apache.org/ffi/get_started/install.html)
 
 ```bash
-# ensure you installed tvm-ffi first
-pip install -e ../..
+# From the quick_start directory
+pip install -ve ../..
+```
+
+## Run the Quick Start
 
-# Build and run the complete example
+From `examples/quick_start` you can build and run everything with the helper 
script:
+
+```bash
 ./run_example.sh
 ```
 
+The script picks an available CMake generator (preferring Ninja), configures a 
build in `build/`, compiles the C++ libraries and drivers, installs the Python 
dependencies from `requirements.txt`, and finally runs the Python and C++ 
demos. If the CUDA toolkit is detected it will also build and execute 
`run_example_cuda`.
+
+If you prefer to drive the build manually, run the following instead:
+
+```bash
+# configure (omit -G Ninja if Ninja is not installed)
+cmake --fresh -G Ninja -B build -S .
+
+# compile the example targets
+cmake --build build --parallel
+
+# install python dependencies for the scripts
+python -m pip install -r requirements.txt
+
+# run the demos
+python run_example.py
+./build/run_example
+./build/run_example_cuda  # optional, requires CUDA toolkit
+```
+
 At a high level, the `TVM_FFI_DLL_EXPORT_TYPED_FUNC` macro helps to expose
 a C++ function into the TVM FFI C ABI convention for functions.
 Then the function can be accessed by different environments and languages
@@ -42,9 +72,12 @@ in Python and C++.
 ## Key Files
 
 - `src/add_one_cpu.cc` - CPU implementation of the add_one function
+- `src/add_one_c.c` - C implementation showing the C ABI workflow
 - `src/add_one_cuda.cu` - CUDA implementation for GPU operations
+- `src/run_example.cc` - C++ example showing how to call the functions
+- `src/run_example_cuda.cc` - C++ example showing how to call the CUDA 
functions
 - `run_example.py` - Python example showing how to call the functions
-- `run_example.cc` - C++ example demonstrating the same functionality
+- `run_example.sh` - Convenience script that builds and runs all examples
 
 ## Compile without CMake
 
diff --git a/examples/quick_start/requirements.txt 
b/examples/quick_start/requirements.txt
new file mode 100644
index 0000000..597d120
--- /dev/null
+++ b/examples/quick_start/requirements.txt
@@ -0,0 +1,3 @@
+# Editable Git install with no remote (apache-tvm-ffi==0.1.0b3)
+numpy==2.2.6
+PyYAML==5.4.1
diff --git a/examples/quick_start/run_example.sh 
b/examples/quick_start/run_example.sh
index 09d8daa..80aa37e 100755
--- a/examples/quick_start/run_example.sh
+++ b/examples/quick_start/run_example.sh
@@ -17,11 +17,25 @@
 # under the License.
 set -ex
 
-cmake -B build -S .
-cmake --build build
+if command -v ninja >/dev/null 2>&1; then
+       generator="Ninja"
+else
+       echo "Ninja not found, falling back to Unix Makefiles" >&2
+       generator="Unix Makefiles"
+fi
+
+cmake --fresh -G "$generator" -B build -S .
+cmake --build build --parallel
+
+# install python dependencies
+python -m pip install -r requirements.txt
 
 # running python example
 python run_example.py
 
 # running c++ example
 ./build/run_example
+
+if [ -x ./build/run_example_cuda ]; then
+       ./build/run_example_cuda
+fi
diff --git a/examples/quick_start/src/run_example_cuda.cc 
b/examples/quick_start/src/run_example_cuda.cc
new file mode 100644
index 0000000..1fdd27c
--- /dev/null
+++ b/examples/quick_start/src/run_example_cuda.cc
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include <cuda_runtime.h>
+#include <tvm/ffi/container/tensor.h>
+#include <tvm/ffi/error.h>
+#include <tvm/ffi/extra/module.h>
+
+#include <iostream>
+#include <vector>
+
+namespace ffi = tvm::ffi;
+
+// This example mirrors run_example.cc but keeps all data on the GPU by 
allocating
+// CUDA tensors, invoking the add_one_cuda FFI function, and copying the 
result back
+// to host memory so users can inspect the output.
+struct CUDANDAlloc {
+  void AllocData(DLTensor* tensor) {
+    size_t data_size = ffi::GetDataSize(*tensor);
+    void* ptr = nullptr;
+    cudaError_t err = cudaMalloc(&ptr, data_size);
+    TVM_FFI_ICHECK_EQ(err, cudaSuccess) << "cudaMalloc failed: " << 
cudaGetErrorString(err);
+    tensor->data = ptr;
+  }
+
+  void FreeData(DLTensor* tensor) {
+    if (tensor->data != nullptr) {
+      cudaError_t err = cudaFree(tensor->data);
+      TVM_FFI_ICHECK_EQ(err, cudaSuccess) << "cudaFree failed: " << 
cudaGetErrorString(err);
+      tensor->data = nullptr;
+    }
+  }
+};
+
+inline ffi::Tensor Empty(ffi::Shape shape, DLDataType dtype, DLDevice device) {
+  return ffi::Tensor::FromNDAlloc(CUDANDAlloc(), shape, dtype, device);
+}
+
+int main() {
+  // Load the CUDA implementation that run_example.cu exports during the CMake 
build.
+  ffi::Module mod = ffi::Module::LoadFromFile("build/add_one_cuda.so");
+
+  DLDataType f32_dtype{kDLFloat, 32, 1};
+  DLDevice cuda_device{kDLCUDA, 0};
+
+  constexpr int ARRAY_SIZE = 5;
+
+  ffi::Tensor x = Empty({ARRAY_SIZE}, f32_dtype, cuda_device);
+  ffi::Tensor y = Empty({ARRAY_SIZE}, f32_dtype, cuda_device);
+
+  std::vector<float> host_x(ARRAY_SIZE);
+  for (int i = 0; i < ARRAY_SIZE; ++i) {
+    host_x[i] = static_cast<float>(i);
+  }
+
+  size_t nbytes = host_x.size() * sizeof(float);
+  cudaError_t err = cudaMemcpy(x->data, host_x.data(), nbytes, 
cudaMemcpyHostToDevice);
+  TVM_FFI_ICHECK_EQ(err, cudaSuccess)
+      << "cudaMemcpy host to device failed: " << cudaGetErrorString(err);
+
+  // Call into the FFI function; tensors remain on device because they carry a
+  // kDLCUDA device tag.
+  ffi::Function add_one_cuda = mod->GetFunction("add_one_cuda").value();
+  add_one_cuda(x, y);
+
+  std::vector<float> host_y(host_x.size());
+  err = cudaMemcpy(host_y.data(), y->data, nbytes, cudaMemcpyDeviceToHost);
+  TVM_FFI_ICHECK_EQ(err, cudaSuccess)
+      << "cudaMemcpy device to host failed: " << cudaGetErrorString(err);
+
+  std::cout << "y after add_one_cuda(x, y)" << std::endl;
+  for (float value : host_y) {
+    std::cout << value << " ";
+  }
+  std::cout << std::endl;
+
+  return 0;
+}
diff --git a/tests/docker/Dockerfile b/tests/docker/Dockerfile
new file mode 100644
index 0000000..1fd6e66
--- /dev/null
+++ b/tests/docker/Dockerfile
@@ -0,0 +1,72 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Using NVIDIA CUDA base image https://hub.docker.com/r/nvidia/cuda
+# Based on your host CUDA driver version, you may need to select an older CUDA 
base image.
+# See 
https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/supported-tags.md
+FROM nvidia/cuda:12.6.3-devel-ubuntu22.04
+
+ARG DEBIAN_FRONTEND=noninteractive
+
+# Install prerequisites for external repositories
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends \
+        ca-certificates \
+        curl \
+        gnupg \
+        wget \
+        lsb-release \
+    && rm -rf /var/lib/apt/lists/*
+
+# Add Kitware APT repository for CMake >= 3.24
+RUN wget -O /tmp/kitware-archive.sh https://apt.kitware.com/kitware-archive.sh 
\
+    && bash /tmp/kitware-archive.sh \
+    && rm -f /tmp/kitware-archive.sh
+
+# Install build essentials, Git, and Python >= 3.9
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends \
+        build-essential \
+        clang \
+        clang-format \
+        clang-tidy \
+        cmake \
+        doxygen \
+        git \
+        graphviz \
+        ninja-build \
+        pandoc \
+        pkg-config \
+        python-is-python3 \
+        python3 \
+        python3-dev \
+        python3-pip \
+        python3-setuptools \
+        python3-venv \
+        python3-wheel \
+        unzip \
+        zip \
+    && rm -rf /var/lib/apt/lists/*
+
+# Provide a working directory for the project
+WORKDIR /workspace
+
+# Optionally clone a repository during build with --build-arg REPO_URL=...
+ARG REPO_URL
+RUN if [ -n "${REPO_URL}" ]; then git clone "${REPO_URL}" repo; fi
+
+CMD ["/bin/bash"]
diff --git a/tests/lint/check_file_type.py b/tests/lint/check_file_type.py
index e9fb40f..9d08f9a 100644
--- a/tests/lint/check_file_type.py
+++ b/tests/lint/check_file_type.py
@@ -110,6 +110,7 @@ ALLOW_FILE_NAME = {
     ".clang-format",
     ".gitmodules",
     "CODEOWNERSHIP",
+    "Dockerfile",
 }
 
 # List of specific files allowed in relpath to <proj_root>

(tvm-ffi) branch main updated: Add Docker support and C++ CUDA example (#29)

Reply via email to