[PR] [Optimization][Operator] Implement and enable Conv2d-Reshape-Add-ReLU fusion [tvm]

via GitHub Tue, 29 Jul 2025 03:53:27 -0700


kimm240 opened a new pull request, #18171:
URL: https://github.com/apache/tvm/pull/18171


   This PR introduces an operator fusion for the common `conv2d` followed by 
`reshape`, `add`, and `relu` sequence, commonly found in deep learning models 
(e.g., convolution + bias + activation pattern in PyTorch). This optimization 
significantly improves performance and efficiency by reducing overhead and 
optimizing memory usage.
   
   1.  **Performance Improvement:**
       * **Reduced Kernel Launch Overhead:** Previously, `conv2d`, `reshape`, 
`add`, and `relu` each required separate kernel calls. By fusing these four 
operations into a single, unified DNNL kernel (e.g., 
`dnnl_fused_conv2d_bias_relu`), the overhead from multiple kernel launches is 
significantly reduced. This is evident from 
`src/runtime/contrib/dnnl/dnnl.cc:154-158`, where all operations are handled by 
a single `execute` call.
       * **Decreased Memory Bandwidth Consumption:** Intermediate results of 
individual operations (e.g., `conv_out`, `bias_add`) traditionally required 
costly memory write-backs and reads. Fusion allows these intermediate values to 
be processed directly in registers or cache, reducing unnecessary memory 
accesses, and thus decreasing memory bandwidth usage and overall execution time.
   
   2.  **Increased Efficiency:**
       * **Leveraging Compiler Optimizations:** By utilizing TVM's 
`FuseOpsByPattern` and `MergeCompositeFunctions` passes, this change generates 
a composite operation optimized for specific backends (like DNNL). This ensures 
that common patterns from frontends like PyTorch are automatically recognized 
within the TVM graph and mapped to high-performance fused kernels provided by 
libraries like DNNL.
       * **Simplified IR Module:** Compilers' Intermediate Representation (IR) 
becomes less complex as multiple operation nodes are condensed into a single 
composite node. This simplification enhances efficiency in subsequent 
optimization and code generation stages.
   
   This fusion is achieved through a two-stage transformation within the TVM 
Relax framework:
   
   1.  **Pattern Recognition and Composite Function Creation 
(`FuseConv2dReshapeAddRelu` Pass):**
       * The `FuseConv2dReshapeAddRelu` class, registered as a 
`tvm.transform.module_pass`, transforms the `IRModule`.
       * The `_conv2d_reshape_add_relu_pattern()` helper function defines the 
specific sequence: `conv2d` -> `reshape` (applied to bias) -> `add` -> `relu` 
using TVM's Declarative Pattern Language (DPL). This includes matching input 
tensors (`data`, `weight`, `bias`, `shape`) using `wildcard()` and identifying 
operation sequence with `is_op()`.
       * The `relax.transform.FuseOpsByPattern` pass identifies this pattern in 
the input `IRModule`. Upon detection, the operation sequence is encapsulated 
into a new Relax function with `{"Composite": "dnnl.conv2d_reshape_add_relu", 
"Primitive": True}` attributes, marking it as a logical "composite" unit.
   
   2.  **Composite Function Merging and Codegen Attribute Assignment 
(`MergeCompositeFunctions` Pass):**
       * Following the `FuseConv2dReshapeAddRelu` pass, the 
`MergeCompositeFunctions` pass is applied via `tvm.ir.transform.Sequential`.
       * This pass identifies functions marked with the `Composite` attribute 
and transforms them into external functions bearing the `{"Codegen": "dnnl"}` 
attribute. This `Codegen` attribute indicates that the composite operation 
should be offloaded to a specific TVM backend, such as DNNL.
       * Consequently, during graph execution, the fused function with the 
`Codegen` attribute will be mapped and executed by an optimized, single DNNL 
kernel, for instance, `dnnl_fused_conv2d_bias_relu` (defined in 
`src/runtime/contrib/dnnl/dnnl.cc:199-207`).
   
   This implementation successfully enables the fusion of the `conv2d + reshape 
+ add + relu` pattern. This ensures that common convolution + bias + activation 
patterns originating from frontends like PyTorch are now fully optimized and 
executed as a single, highly efficient DNNL kernel within TVM.
   
   ---
   
   To verify this fusion, you can directly run the specific test case:
   
   python tests/python/relax/test_conv2d_reshape_add_relu.py


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [Optimization][Operator] Implement and enable Conv2d-Reshape-Add-ReLU fusion [tvm]

Reply via email to