[PR] [Optimization][Operator] Implement and enable Conv2d-Reshape-Add-ReLU fusion [tvm]

via GitHub Tue, 29 Jul 2025 22:43:17 -0700


kimm240 opened a new pull request, #18173:
URL: https://github.com/apache/tvm/pull/18173


   This PR introduces an operator fusion for the common conv2d followed by 
reshape, add, and relu sequence, commonly found in deep learning models (e.g., 
convolution + bias + activation pattern in PyTorch). This optimization 
significantly improves performance and efficiency by reducing overhead and 
optimizing memory usage.
   
   Performance Improvement:
   
   Reduced Kernel Launch Overhead: Previously, conv2d, reshape, add, and relu 
each required separate kernel calls. By fusing these four operations into a 
single, unified DNNL kernel (e.g., dnnl_fused_conv2d_bias_relu), the overhead 
from multiple kernel launches is significantly reduced. This is evident from 
src/runtime/contrib/dnnl/dnnl.cc:154-158, where all operations are handled by a 
single execute call.
   Decreased Memory Bandwidth Consumption: Intermediate results of individual 
operations (e.g., conv_out, bias_add) traditionally required costly memory 
write-backs and reads. Fusion allows these intermediate values to be processed 
directly in registers or cache, reducing unnecessary memory accesses, and thus 
decreasing memory bandwidth usage and overall execution time.
   Increased Efficiency:
   
   Leveraging Compiler Optimizations: By utilizing TVM's FuseOpsByPattern and 
MergeCompositeFunctions passes, this change generates a composite operation 
optimized for specific backends (like DNNL). This ensures that common patterns 
from frontends like PyTorch are automatically recognized within the TVM graph 
and mapped to high-performance fused kernels provided by libraries like DNNL.
   Simplified IR Module: Compilers' Intermediate Representation (IR) becomes 
less complex as multiple operation nodes are condensed into a single composite 
node. This simplification enhances efficiency in subsequent optimization and 
code generation stages.
   This fusion is achieved through a two-stage transformation within the TVM 
Relax framework:
   
   Pattern Recognition and Composite Function Creation 
(FuseConv2dReshapeAddRelu Pass):
   
   The FuseConv2dReshapeAddRelu class, registered as a 
tvm.transform.module_pass, transforms the IRModule.
   The _conv2d_reshape_add_relu_pattern() helper function defines the specific 
sequence: conv2d -> reshape (applied to bias) -> add -> relu using TVM's 
Declarative Pattern Language (DPL). This includes matching input tensors (data, 
weight, bias, shape) using wildcard() and identifying operation sequence with 
is_op().
   The relax.transform.FuseOpsByPattern pass identifies this pattern in the 
input IRModule. Upon detection, the operation sequence is encapsulated into a 
new Relax function with {"Composite": "dnnl.conv2d_reshape_add_relu", 
"Primitive": True} attributes, marking it as a logical "composite" unit.
   Composite Function Merging and Codegen Attribute Assignment 
(MergeCompositeFunctions Pass):
   
   Following the FuseConv2dReshapeAddRelu pass, the MergeCompositeFunctions 
pass is applied via tvm.ir.transform.Sequential.
   This pass identifies functions marked with the Composite attribute and 
transforms them into external functions bearing the {"Codegen": "dnnl"} 
attribute. This Codegen attribute indicates that the composite operation should 
be offloaded to a specific TVM backend, such as DNNL.
   Consequently, during graph execution, the fused function with the Codegen 
attribute will be mapped and executed by an optimized, single DNNL kernel, for 
instance, dnnl_fused_conv2d_bias_relu (defined in 
src/runtime/contrib/dnnl/dnnl.cc:199-207).
   This implementation successfully enables the fusion of the conv2d + reshape 
+ add + relu pattern. This ensures that common convolution + bias + activation 
patterns originating from frontends like PyTorch are now fully optimized and 
executed as a single, highly efficient DNNL kernel within TVM.
   
   To verify this fusion, you can directly run the specific test case:
   
   python tests/python/relax/test_conv2d_reshape_add_relu.py


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [Optimization][Operator] Implement and enable Conv2d-Reshape-Add-ReLU fusion [tvm]

Reply via email to