This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-site.git


The following commit(s) were added to refs/heads/main by this push:
     new b89bf8f701 Initial post checkin
b89bf8f701 is described below

commit b89bf8f701616b12ff343c571cdaef56a00743f0
Author: tqchen <[email protected]>
AuthorDate: Tue Oct 21 11:39:48 2025 -0700

    Initial post checkin
---
 _data/menus.yml                      |   6 +-
 _layouts/post.html                   |   4 +-
 _posts/2025-10-21-tvm-ffi.md         | 127 +++++++++++++++++++++++++++++++++++
 css/custom.scss                      |  16 +++++
 images/tvm-ffi/c_abi.png             | Bin 0 -> 346556 bytes
 images/tvm-ffi/cuda_export.png       | Bin 0 -> 301164 bytes
 images/tvm-ffi/interop-challenge.png | Bin 0 -> 228386 bytes
 images/tvm-ffi/load_cpp.png          | Bin 0 -> 152841 bytes
 images/tvm-ffi/load_pytorch.png      | Bin 0 -> 73488 bytes
 images/tvm-ffi/mydsl.png             | Bin 0 -> 103941 bytes
 images/tvm-ffi/safecall.png          | Bin 0 -> 44343 bytes
 images/tvm-ffi/shiponewheel.png      | Bin 0 -> 210773 bytes
 images/tvm-ffi/throw.png             | Bin 0 -> 27243 bytes
 images/tvm-ffi/tvm-ffi.png           | Bin 0 -> 138495 bytes
 images/tvm-ffi/tvmffiany.png         | Bin 0 -> 166291 bytes
 images/tvm-ffi/tvmffiobject.png      | Bin 0 -> 60762 bytes
 blog.html => posts.html              |   9 ++-
 17 files changed, 156 insertions(+), 6 deletions(-)

diff --git a/_data/menus.yml b/_data/menus.yml
index c535446e33..9f02025c49 100644
--- a/_data/menus.yml
+++ b/_data/menus.yml
@@ -2,7 +2,9 @@
   link: /community
 - name: Download
   link: /download
-- name: Docs
-  link: https://tvm.apache.org/docs/
+- name: Posts
+  link: /posts
+- name: TVM FFI
+  link: https://github.com/apache/tvm-ffi/
 - name: Github
   link: https://github.com/apache/tvm/
diff --git a/_layouts/post.html b/_layouts/post.html
index b9c5edee29..7011aef6c3 100644
--- a/_layouts/post.html
+++ b/_layouts/post.html
@@ -27,7 +27,9 @@ layout: default
           </span>
         </span>{% endif %}</p>
     </br>
-    {{ content }}
+    <div class="post-content">
+      {{ content }}
+    </div>
     </div>
   </div>
 </div>
diff --git a/_posts/2025-10-21-tvm-ffi.md b/_posts/2025-10-21-tvm-ffi.md
new file mode 100644
index 0000000000..9e49f27d10
--- /dev/null
+++ b/_posts/2025-10-21-tvm-ffi.md
@@ -0,0 +1,127 @@
+---
+ layout: post
+ title: "Building an Open ABI and FFI for ML Systems"
+ date: 2025-10-21
+ author: "Apache TVM FFI Community"
+---
+
+
+
+We are currently living in an exciting era for AI, where machine learning 
systems and infrastructures are crucial for training and deploying efficient AI 
models. The modern machine learning systems landscape comes rich with diverse 
components, including popular ML frameworks and array libraries like JAX, 
PyTorch, and CuPy. It also includes specialized libraries such as 
FlashAttention, FlashInfer and cuDNN. Furthermore, there's a growing trend of 
ML compilers and domain-specific languages [...]
+
+The exciting growth of the ecosystem is the reason for the fast pace of 
innovation in AI today. However, it also presents a significant challenge: 
**interoperability**. Many of those components need to integrate with each 
other. For example, libraries such as FlashInfer, cuDNN needs to be integrated 
into PyTorch, JAX, TensorRT’s runtime system, each may come with different 
interface requirements. ML compilers and DSLs also usually expose Python JIT 
binding support, while also need to bri [...]
+
+![image](/images/tvm-ffi/interop-challenge.png){: style="width: 70%; margin: 
auto; display: block;" }
+
+The the core of these interoperability challenges are the **Application Binary 
Interface (ABI)** and the **Foreign Function Interface (FFI)**. **ABI** defines 
how data structures are stored in memory and precisely what occurs when a 
function is called. For instance, the way torch stores Tensors may be different 
from say cupy/numpy, so we cannot directly pass a torch.Tensor pointer and its 
treatment as a cupy.NDArray. The very nature of machine learning applications 
usually mandates cross [...]
+
+All of the above observations call for a **need for ABI and FFI for the ML 
systems** use-cases. Looking at the state today, luckily, we do have something 
to start with – the C ABI, which every programming language speaks and remains 
stable over time. Unfortunately, C only focuses on low-level data types such as 
int, float and raw pointers. On the other end of the spectrum, we know that 
python is something that must gain first-class support, but also there is still 
a need for different-la [...]
+
+This post introduces TVM FFI, an **open ABI and FFI for machine learning 
systems**. The project evolved from multiple years of ABI calling conventions 
design iterations in the Apache TVM project. We find that the design can be 
made generic, independent from the choice of compiler/language and should 
benefit the ML systems community. As a result, we brought into a minimal 
library built from the ground up with a clear intention to become an open, 
standalone library that can be shared and e [...]
+
+- **Stable, minimal C ABI** designed for kernels, DSLs, and runtime 
extensibility.
+- **Zero-copy interop** across PyTorch, JAX, and CuPy using [DLPack 
protocol](https://data-apis.org/array-api/2024.12/design_topics/data_interchange.html).
+- **Compact value and call convention** covering common data types for ultra 
low-overhead ML applications.
+- **Multi-language support out of the box:** Python, C++, and Rust (with a 
path towards more languages).
+
+![image](/images/tvm-ffi/tvm-ffi.png){: style="width: 70%; margin: auto; 
display: block;" }
+
+Importantly, the goal of the project is not to create another framework or 
language. Instead it aims to get the ML system components to do their magic, 
and enables them to amplify each other more organically.
+
+
+## **Technical Design**
+
+To start with, we need a mechanism to store the values that are passing across 
machine learning frameworks. It achieves this using a core data structure 
called TVMFFIAny. It is a 16 bytes C structure that follows the design 
principle of tagged-union
+
+![image](/images/tvm-ffi/tvmffiany.png){: style="width: 50%; margin: auto; 
display: block;" }
+
+
+
+The objects in TVMFFIObject are managed as intrusive pointers, where 
TVMFFIObject itself contains the header of the pointer that helps to manage 
type information and deletion. This design allows us to use the same type_index 
mechanism that allows for the future growth and recognition of new kinds of 
objects within the FFI, ensuring extensibility. The standalone deleter ensures 
objects can be safely allocated by one source or language and deleted in 
another place.
+
+![image](/images/tvm-ffi/tvmffiobject.png){: style="width: 50%; margin: auto; 
display: block;" }
+
+
+We provide first-class support for owned and unowned Tensor support that 
adopts DLPack DLTensor layout. Thanks to the collective efforts from the ML 
system ecosystems, we can leverage DLPack for first class support and bring in 
tensors/arrays from PyTorch, NumPy, JAX. We also provide support for common 
data types such as string, array, and map. Generally, these values cover most 
common machine learning system use cases we know of. The type_index mechanism 
still leaves room for registerin [...]
+
+As discussed in the overview, we need to consider foreign function calls as 
first class citizens. We adopt a single standard C function as follows:
+
+![image](/images/tvm-ffi/safecall.png){: style="width: 50%; margin: auto; 
display: block;" }
+
+
+The handle contains the pointer to the function object itself, allowing us to 
support closures. args and num_args describe the input arguments and results 
store the return value. When args and results contain heap-managed objects, we 
expect the caller to own args and results.
+
+We call this approach a packed function, as it provides a single signature to 
represent all functions in a “type-erased” way. It saves the need to declare 
and JIT shim for each FFI function call while maintaining reasonable 
efficiency. This mechanism enables the following scenarios
+
+- **Calling from Dynamic Languages (e.g., Python):** we provide a tvm_ffi 
binding that prepares the args based on dynamically examining Python arguments 
passed in.
+- **Calling from Static Languages (e.g., C++):** For static languages, we can 
leverage C++ templates to directly instantiate the arguments on the stack, 
saving the need for dynamic examination
+- **Dynamic language Callbacks:** the signature enables us to easily bring 
dynamic language (Python) callbacks as ffi::Function, as we can take each 
argument and convert to the dynamic values.
+
+**Efficiency** In practice, we find this approach is sufficient for machine 
learning focused workloads. For example, we can get to **0.4 us**  level 
overhead for Python/C++ calls, which is already very close to the limit (for 
reference, each python c extension call is at least **0.1us**), and much faster 
than most ML system python eager use cases which are usually above 1-2 us 
level. When both sides of calls are static languages, the overhead will go down 
to tens of nanoseconds. As a sid [...]
+
+We support first class Function objects that allow us to also pass 
function/closures from different places around, enabling cool usages such as 
quick python callback for prototyping, and dynamic Functor creation for 
driver-based kernel launching.
+
+**Error handling** Because the function ABI is based on C, we need a method to 
propagate errors. A non-zero return value of TVMFFISafeCallType indicates an 
error. We provide a thread-local storage (TLS) based C API to set and fetch 
errors, and we also build library bindings to automatically translate 
exceptions. For example, the macro
+
+![image](/images/tvm-ffi/safecall.png){: style="width: 50%; margin: auto; 
display: block;" }
+
+will raise an exception that translates into a TypeError in Python. We also 
preserve and propagate tracebacks across FFI boundaries whenever possible. The 
TLS-based API is a simple yet effective convention for DSL compilers and 
libraries to leverage for efficient error propagation.
+
+**First-class GPU Support for PyTorch** We provide first-class support for 
torch.Tensors, it will automatically zero-copy transfer to an FFI Tensor. We 
also provide a minimal stream context so that the stream is carried over from 
the PyTorch Stream context. In short, calling a function would serve like a 
normal PyTorch functions when passing in torch Tensor arguments.
+
+## Ship One Wheel
+
+TVM FFI provides a minimal pip package that includes libtvm_ffi, which handles 
essential registration and context management. The package consists of a C++ 
library that automatically manages function types built upon the C ABI, and a 
Python library for interacting with this convention.
+Because we defined a stable ABI for ML systems, kernel libraries, the compiled 
library is agnostic to **Python ABI and PyTorch versions,** and can work across 
multiple python versions (including free-threaded python). This allows us to 
**ship one wheel(library)** for multiple frameworks and python environments, 
and greatly simplifies the deployment.
+
+![image](/images/tvm-ffi/shiponewheel.png){: style="width: 70%; margin: auto; 
display: block;" }
+
+
+The above figure shows how it works in practice, most libraries only need to 
ship `mylib.so` that links to the ABI, then the particular python version 
specific apache-tvm-ffi package will handle the bridge to specific Python 
version. The same mechanism also works for non-python inference engines. There 
are many ways to build a library that targets the tvm-ffi ABI. The following 
example shows how can we do that in cuda
+
+![image](/images/tvm-ffi/cuda_export.png){: style="width: 50%; margin: auto; 
display: block;" }
+
+
+Once we compiled this library into mylib, then it can be loaded back into 
Python or any other runtime that works with TVM FFI.
+
+![image](/images/tvm-ffi/load_pytorch.png){: style="width: 50%; margin: auto; 
display: block;" }
+
+Notably, this same function can be loaded from other runtimes and languages 
that  interfaces with the tvm-ffi. For example, the same example contains a C++ 
loading  
+
+![image](/images/tvm-ffi/load_cpp.png){: style="width: 50%; margin: auto; 
display: block;" }
+
+
+The ABI is designed **with the needs of DSL compilers in mind.** Because the 
ABI is minimal, we can readily target it in C (or any of low-level compiler IRs 
such as LLVM IR, or MLIR LLVM dialect).
+Once DSL integrates with the ABI, we can leverage the same flow to load back 
and run the library as normal torch functions. Additionally, we can also 
support JIT mechanisms to the same ABI. 
+
+![image](/images/tvm-ffi/mydsl.png){: style="width: 40%; margin: auto; 
display: block;" }
+
+
+
+As we can see, the common open ABI foundation offers numerous opportunities 
for ML systems to interoperate. We anticipate that this solution can 
significantly benefit various aspects of ML systems and AI infrastructure:
+
+* **Kernel libraries**: Ship a single package to support multiple frameworks, 
Python versions, and different languages.
+* **Kernel DSLs**:  a reusable ABI for JIT and AOT kernel exposure frameworks 
and runtimes.
+* **Frameworks and runtimes**: Offer a uniform interop with ABI-compliant 
libraries and DSLs.
+* **ML infrastructure**: Enable out-of-the-box interoperability for Python, 
C++, and Rust.
+* **Coding agents**: Establish a unified mechanism for shipping generated code 
in production.
+
+Currently, the tvm-ffi package offers out-of-the-box support for frameworks 
like PyTorch, JAX, and CuPy. We are also collaborating with machine learning 
system builders to develop solutions based on it. For instance, FlashInfer now 
ships with tvm-ffi, and active work is underway to enable more DSL libraries, 
agent solutions, and inference runtimes.
+This project also is an important step for Apache TVM itself, as we will start 
to
+provide neutral and modular infrastructure pieces that can be useful broadly to
+the machine learning system ecosystems.
+
+## Links
+
+TVM FFI is an open convention that is independent from a specific compiler or 
framework.
+We welcome contributions and encourage the ML systems community to collaborate 
on improving the open ABI.
+Please checkout the following resources:
+
+- Github: 
[https://github.com/apache/tvm-ffi/](https://github.com/apache/tvm-ffi/)
+- [Quick start 
examples](https://tvm.apache.org/ffi/get_started/quickstart.html)
+
+## Acknowledgement
+
+The project draws collective wisdoms of the Machine Learning System community 
and python open source ecosystem, including past development insights of many 
developers from numpy, PyTorch, JAX, Caffe, mxnet, XGBoost, cuPy, pybind11, 
nanobind and more.
+
+We would specifically like to thank the PyTorch team, JAX team, CUDA python 
team,  cuteDSL team,  cuTile team,  Apache TVM community,  XGBoost team, TiLang 
team, Triton distributed team, FlashInfer team,  SGLang community,  
TensorRT-LLM community, the vLLM community, for their their insightful 
feedbacks.
diff --git a/css/custom.scss b/css/custom.scss
index 01da555b30..3412368487 100644
--- a/css/custom.scss
+++ b/css/custom.scss
@@ -58,6 +58,17 @@ ul{
     padding:0;
     margin:0;
 }
+
+/* Re-enable bullets inside blog post content only */
+.post-content ul,
+.post-content ol {
+    list-style: revert;
+    margin-left: 1.25rem;
+    padding-left: 1.25rem;
+}
+.post-content li {
+    list-style-position: outside;
+}
 h1 {
     font-weight: 400;
     font-size: 55px;
@@ -1421,6 +1432,11 @@ table th, table td {
 .highlight .w {
   color: #bbbbbb;
 }
+.bloglist {
+  list-style-type: disc;
+  padding-left: 20px;
+}
+
 .highlight {
   background-color: #f8f8f8;
 }
diff --git a/images/tvm-ffi/c_abi.png b/images/tvm-ffi/c_abi.png
new file mode 100644
index 0000000000..87f64e3fa2
Binary files /dev/null and b/images/tvm-ffi/c_abi.png differ
diff --git a/images/tvm-ffi/cuda_export.png b/images/tvm-ffi/cuda_export.png
new file mode 100644
index 0000000000..9babbf4ced
Binary files /dev/null and b/images/tvm-ffi/cuda_export.png differ
diff --git a/images/tvm-ffi/interop-challenge.png 
b/images/tvm-ffi/interop-challenge.png
new file mode 100644
index 0000000000..9449b5fa2f
Binary files /dev/null and b/images/tvm-ffi/interop-challenge.png differ
diff --git a/images/tvm-ffi/load_cpp.png b/images/tvm-ffi/load_cpp.png
new file mode 100644
index 0000000000..21240988ef
Binary files /dev/null and b/images/tvm-ffi/load_cpp.png differ
diff --git a/images/tvm-ffi/load_pytorch.png b/images/tvm-ffi/load_pytorch.png
new file mode 100644
index 0000000000..87612bbbd6
Binary files /dev/null and b/images/tvm-ffi/load_pytorch.png differ
diff --git a/images/tvm-ffi/mydsl.png b/images/tvm-ffi/mydsl.png
new file mode 100644
index 0000000000..e5a08d7284
Binary files /dev/null and b/images/tvm-ffi/mydsl.png differ
diff --git a/images/tvm-ffi/safecall.png b/images/tvm-ffi/safecall.png
new file mode 100644
index 0000000000..d64773c94a
Binary files /dev/null and b/images/tvm-ffi/safecall.png differ
diff --git a/images/tvm-ffi/shiponewheel.png b/images/tvm-ffi/shiponewheel.png
new file mode 100644
index 0000000000..fbf8d4dd54
Binary files /dev/null and b/images/tvm-ffi/shiponewheel.png differ
diff --git a/images/tvm-ffi/throw.png b/images/tvm-ffi/throw.png
new file mode 100644
index 0000000000..cc4f5f86ac
Binary files /dev/null and b/images/tvm-ffi/throw.png differ
diff --git a/images/tvm-ffi/tvm-ffi.png b/images/tvm-ffi/tvm-ffi.png
new file mode 100644
index 0000000000..2bcd65ec65
Binary files /dev/null and b/images/tvm-ffi/tvm-ffi.png differ
diff --git a/images/tvm-ffi/tvmffiany.png b/images/tvm-ffi/tvmffiany.png
new file mode 100644
index 0000000000..ef622e29b1
Binary files /dev/null and b/images/tvm-ffi/tvmffiany.png differ
diff --git a/images/tvm-ffi/tvmffiobject.png b/images/tvm-ffi/tvmffiobject.png
new file mode 100644
index 0000000000..bcf8c72e98
Binary files /dev/null and b/images/tvm-ffi/tvmffiobject.png differ
diff --git a/blog.html b/posts.html
similarity index 71%
rename from blog.html
rename to posts.html
index 3cefd770d9..bcfa3c2b7c 100644
--- a/blog.html
+++ b/posts.html
@@ -1,16 +1,18 @@
 ---
 layout: page
-title : Blog
-header : Blogposts
+title : Posts
+header : Posts
 group : blog
 order : 100
 ---
 {% include JB/setup %}
 
-<h1>TVM Community Blog</h1>
+<h1>Posts</h1>
 
 <ul class="bloglist">
 {% for post in site.posts %}
+{% assign post_year = post.date | date: "%Y" %}
+{% if post_year >= "2025" %}
 <li>
   <span>
     <a class="post-link" href="{{ post.url | prepend: site.baseurl }}">{{ 
post.title }}</a>
@@ -20,5 +22,6 @@ order : 100
     {{ post.date | date: "%b %-d, %Y" }}
   </span>
 </li>
+{% endif %}
 {% endfor %}
 </ul>

Reply via email to