[I] [Release] v0.21.0 Release Candidate Notes [tvm]

via GitHub Tue, 15 Jul 2025 20:54:00 -0700


ysh329 opened a new issue, #18150:
URL: https://github.com/apache/tvm/issues/18150


   # Introduction
   
   The TVM community has worked since the last release to deliver the following 
new exciting improvements!
   
   The main tags are below (**bold text is with lots of progress**): Relax 
(especial PyTorch frontend), CUDA etc.
   
   Please visit the full listing of commits for a complete view: 
[v0.20.dev0...v0.20.0.rc0](https://github.com/apache/tvm/compare/v0.21.dev0...v0.21.0.rc0).
   
   ### Community
   
   None.
   
   ### RFCs
   
   None.
   
   ### Arith
    * [#18067](https://github.com/apache/tvm/pull/18067) - Add IsBound method 
to ConstIntBoundAnalyzer
    * [#18031](https://github.com/apache/tvm/pull/18031) - Canonicalize 
mul-coefficient to rhs
    * [#18025](https://github.com/apache/tvm/pull/18025) - Fix canonical 
simplify for LE with incorrect range assumptions
   
   ### BugFix
    * [#18115](https://github.com/apache/tvm/pull/18115) - [Fix][Serialization] 
Add support for NaN value serialization
    * [#18103](https://github.com/apache/tvm/pull/18103) - [Fix] Replace 
dmlc::Error with std::exception in VerifyGPUCode
    * [#18092](https://github.com/apache/tvm/pull/18092) - [Fix] Fix 
ExecBuilderDeclareFunction method name in exec_builder.py
    * [#18087](https://github.com/apache/tvm/pull/18087) - fix exception when 
tvm not built with llvm support
    * [#18035](https://github.com/apache/tvm/pull/18035) - [CUDA] Fix: Update 
settings for rerun on Increase FloatImm precision when printing 64 bit values 
in CUDA codegen
    * [#17968](https://github.com/apache/tvm/pull/17968) - [Relax][Pytorch] 
Bugfix of conv_transpose1d and conv_transpose2d
    * [#17950](https://github.com/apache/tvm/pull/17950) - [Fix][Relax] Fix 
dangling reference in GetTargetFunctions()
    * [#17902](https://github.com/apache/tvm/pull/17902) - Fix off-by-one error 
in the type index range check within Object::IsInstance()
    * [#17882](https://github.com/apache/tvm/pull/17882) - [Relax][Pytorch] Fix 
incorrect behaviour of % (mod) operator in TVM frontend
    * [#17875](https://github.com/apache/tvm/pull/17875) - [Relax][Pytorch] 
Incorrect Handling of In-Place Ops in FX-Based TVM Frontend
    * [#17838](https://github.com/apache/tvm/pull/17838) - [TIR] Schedule 
support reverse-inline with reduction blocks
   
   ### CI
    * [#18071](https://github.com/apache/tvm/pull/18071) - Update windows to 
2025
    * [#18058](https://github.com/apache/tvm/pull/18058) - [TEST] Move temp 
files into tempdir
    * [#18037](https://github.com/apache/tvm/pull/18037) - Further robustify 
is_last_build check
    * [#17981](https://github.com/apache/tvm/pull/17981) - Update images to 
`20250513-063354-70aa3797`
    * [#17891](https://github.com/apache/tvm/pull/17891) - Update images to 
20250428-080833-03eadc65
    * [#17905](https://github.com/apache/tvm/pull/17905) - Install PyTorch 2.7 
compatible with CUDA 11.8
    * [#17887](https://github.com/apache/tvm/pull/17887) - Upgrade pytorch to 
2.7.0, torchvision to 0.22.0, and vulkan sdk to 1.4.309
    * [#17846](https://github.com/apache/tvm/pull/17846) - Upgrade ubuntu 
runner image for GitHub CI
   
   ### Docker
    * [#17955](https://github.com/apache/tvm/pull/17955) - [CI] Reintroduce 
NNEF to CI images
   
   ### Docs
    * [#18056](https://github.com/apache/tvm/pull/18056) - Update installation 
instruction based ffi refactor
   
   ### Frontend
    * [#18090](https://github.com/apache/tvm/pull/18090) - [Relax][ONNX] Update 
Reduce ops to support axes as input
    * [#18072](https://github.com/apache/tvm/pull/18072) - [Relax][ONNX] Update 
ReduceL1 to opset 18
    * [#18016](https://github.com/apache/tvm/pull/18016) - [Relax][ONNX] 
Replace deprecated `mapping.TENSOR_TYPE_TO_NP_TYPE` usage
    * [#18001](https://github.com/apache/tvm/pull/18001) - [Relax][ONNX] Fix: 
bitwise_not misclassified as binary (is …
    * [#17990](https://github.com/apache/tvm/pull/17990) - [Relax]Fix: Output 
tensor with zero dimension after torch.u…
    * [#17925](https://github.com/apache/tvm/pull/17925) - [Relax][PyTorch] 
Re-enable test_subgraph_capture in dynamo test
    * [#17980](https://github.com/apache/tvm/pull/17980) - [ONNX] Make bias 
input optional in LayerNormalization
    * [#17918](https://github.com/apache/tvm/pull/17918) - [Relax][PyTorch] Add 
ReLU6 Op Support for Exported Program and FX graph
    * [#17930](https://github.com/apache/tvm/pull/17930) - [Relax][PyTorch] Add 
torch.outer Op Support for Exported Program and FX graph 
    * [#17932](https://github.com/apache/tvm/pull/17932) - [Relax][PyTorch] Add 
UpSample Bicubic Op Support for Exported Program and FX graph
    * [#17921](https://github.com/apache/tvm/pull/17921) - [Relax][PyTorch] Add 
AvgPool 1D and 3D Op Support for Exported Program and FX graph
    * [#17922](https://github.com/apache/tvm/pull/17922) - [Relax][PyTorch] Add 
Adaptive AvgPool 1D and 3D Op Support for Exported Program and FX graph
    * [#17863](https://github.com/apache/tvm/pull/17863) - [Relax][PyTorch] 
CrossEntropyLoss
    * [#17919](https://github.com/apache/tvm/pull/17919) - [Relax][PyTorch] Add 
MaxPool 1D and 3D Op Support for Exported Program and FX graph
    * [#17926](https://github.com/apache/tvm/pull/17926) - [Relax][PyTorch] Add 
tests for all the dtypes supported in the PyTorch frontend
    * [#17924](https://github.com/apache/tvm/pull/17924) - [Relax][PyTorch] Add 
div.Tensor_mode and trunc Op Support for Exported Program and FX graph
    * [#17904](https://github.com/apache/tvm/pull/17904) - [Relax][PyTorch] Add 
Meshgrid Op Support for Exported Program and FX graph
    * [#17915](https://github.com/apache/tvm/pull/17915) - [Relax][PyTorch] Add 
support for linspace op in fx graph
    * [#17886](https://github.com/apache/tvm/pull/17886) - [Relax][PyTorch] Add 
Pixel Shuffle Op Support for Exported Program and FX graph
    * [#17908](https://github.com/apache/tvm/pull/17908) - [Relax][PyTorch] Add 
support for eye op in fx graph
    * [#17893](https://github.com/apache/tvm/pull/17893) - [Relax][Pytorch] Add 
fmod support
    * [#17894](https://github.com/apache/tvm/pull/17894) - [Relax][PyTorch] 
Support torch.bfloat16 dtype in pytorch frontend
    * [#17878](https://github.com/apache/tvm/pull/17878) - [Relax][PyTorch] Add 
torch.isin Op Support for Exported Program and FX graph
    * [#17889](https://github.com/apache/tvm/pull/17889) - [Relax][PyTorch] 
Support linspace op for ExportedProgram importer
    * [#17868](https://github.com/apache/tvm/pull/17868) - [Relax][Pytorch] Add 
support for ones_like, zero_, zeros, type_as, item ops
    * [#17857](https://github.com/apache/tvm/pull/17857) - [Relax][PyTorch] 
Refactor norm op for ExportedProgram importer
    * [#17852](https://github.com/apache/tvm/pull/17852) - [Relax][PyTorch] 
Sort.default
    * [#17871](https://github.com/apache/tvm/pull/17871) - [Relax][Pytorch] Add 
support for bitwise_or op support
    * [#17836](https://github.com/apache/tvm/pull/17836) - [Relax][PyTorch] 
support for index.Tensor
    * [#17864](https://github.com/apache/tvm/pull/17864) - [Relax][PyTorch] 
Support eye op for ExportedProgram importer
    * [#17858](https://github.com/apache/tvm/pull/17858) - [Relax][PyTorch] Add 
copy_ op support in fxGraph
    * [#17851](https://github.com/apache/tvm/pull/17851) - [Relax][PyTorch] 
Support `leaky_relu_.default` and `reshape_as.default` in ExportedProgram 
frontend
    * [#17843](https://github.com/apache/tvm/pull/17843) - [Relax][PyTorch] Add 
mul_.Tensor, max.default, min.default and pow.Scalar Op Support into Exported 
Program Frontend
    * [#17821](https://github.com/apache/tvm/pull/17821) - [Relax][PyTorch] Add 
Pad Op Support for Exported Program and FX graph
    * [#17819](https://github.com/apache/tvm/pull/17819) - [Relax][PyTorch] Add 
Stack Op Support for Exported Program 
    * [#17849](https://github.com/apache/tvm/pull/17849) - [Relax][PyTorch] Add 
RSub Op Support for Exported Program and FX graph
    * [#17850](https://github.com/apache/tvm/pull/17850) - [Relax][Pytorch] Add 
masked_fill op support in ExportedProgram
    * [#17816](https://github.com/apache/tvm/pull/17816) - [Relax][PyTorch] Add 
PReLU Op Support for Exported Program and FX graph
    * [#17803](https://github.com/apache/tvm/pull/17803) - [Relax][PyTorch] Add 
Logaddexp op support for exported program 
    * [#17841](https://github.com/apache/tvm/pull/17841) - [Relax][PyTorch] Add 
support for norm op
    * [#17832](https://github.com/apache/tvm/pull/17832) - [Relax][PyTorch] 
full.default, full_like.default, ones.default 
    * [#17830](https://github.com/apache/tvm/pull/17830) - [Relax][PyTorch] 
Support narrow and broadcast_to ops for ExportedProgram importer
   
   ### LLVM
    * [#17859](https://github.com/apache/tvm/pull/17859) - [Codegen] Enable 
SVE/VLA for RISCV targets
    * [#17958](https://github.com/apache/tvm/pull/17958) - Fix JIT unknown 
reloc issue for case of RISCV
    * [#17954](https://github.com/apache/tvm/pull/17954) - [FFI]Fix compilation 
errors with clang20
   
   ### Metal
    * [#18034](https://github.com/apache/tvm/pull/18034) - Fix `GetFunction` of 
metal runtime
   
   ### ROCm
    * [#18029](https://github.com/apache/tvm/pull/18029) - Fix ROCm build after 
FFI refactor
   
   ### Relax
    * [#18102](https://github.com/apache/tvm/pull/18102) - Fix rotary embedding 
buffer size calculation
    * [#17928](https://github.com/apache/tvm/pull/17928) - [KVCache] Per Layer 
Sliding Window
    * [#17840](https://github.com/apache/tvm/pull/17840) - Refactor missing op 
check into shared utility for Torch frontends
    * [#17826](https://github.com/apache/tvm/pull/17826) - Fix Torch frontends 
to report all the missing ops
   
   ### Runtime
    * [#18097](https://github.com/apache/tvm/pull/18097) - CutensorMap support
   
   ### TIR
    * [#18068](https://github.com/apache/tvm/pull/18068) - Extend address_of to 
support Buffer objects
    * [#18069](https://github.com/apache/tvm/pull/18069) - Fix block access 
region detection for nested let bindings
    * [#18057](https://github.com/apache/tvm/pull/18057) - Phase out 
ProducerStore, ProducerRealize and Prefetch
   
   ### TOPI
    * [#18039](https://github.com/apache/tvm/pull/18039) - [Relax] Support 
InstanceNorm & Bugfix of InstanceNorm
    * [#18063](https://github.com/apache/tvm/pull/18063) - [NN][Layer_Norm] Fix 
layer_norm error with reduce-only axes
    * [#18006](https://github.com/apache/tvm/pull/18006) - Fix index handling 
in expand_like operator for axis expansion
    * [#18015](https://github.com/apache/tvm/pull/18015) - Support integer type 
input for log10
    * [#17942](https://github.com/apache/tvm/pull/17942) - Add shape validation 
to prevent negative dimensions in conv operations
   
   ### Vulkan
    * [#18005](https://github.com/apache/tvm/pull/18005) - Add TIR unary 
trigonometric/hyperbolic intrinsic definitions
   
   ### cuda & cutlass & tensorrt
    * [#18064](https://github.com/apache/tvm/pull/18064) - [CUTLASS] Fix 
CUTLASS kernel build on Hopper
    * [#18033](https://github.com/apache/tvm/pull/18033) - [CUTLASS] Add GeMM 
kernels for Blackwell GPUs
    * [#18024](https://github.com/apache/tvm/pull/18024) - [CUDA] Fix thrust 
with latest FFI refactor
    * [#18118](https://github.com/apache/tvm/pull/18118) - bump 
cutlass_fpA_intB_gemm
    * [#18113](https://github.com/apache/tvm/pull/18113) - [CMake] Refine 
C++/CUDA standard settings in CMakeLists.txt
   
   ### FFI
    * [#18076](https://github.com/apache/tvm/pull/18076) - [FFI][REFACTOR] 
Stablize container ABI and implementation
    * [#18091](https://github.com/apache/tvm/pull/18091) - [FFI] Provide Field 
Visit bridge so we can do gradual transition
    * [#18095](https://github.com/apache/tvm/pull/18095) - [FFI][REFACTOR] 
Migrate attrs to use new reflection
    * [#18083](https://github.com/apache/tvm/pull/18083) - [FFI] Update 
typeinfo to speedup parent reflection
    * [#18077](https://github.com/apache/tvm/pull/18077) - [FFI] Optimize 
atomic decref in Object
    * [#18065](https://github.com/apache/tvm/pull/18065) - [FFI] Introduce FFI 
reflection support in python
    * [#18062](https://github.com/apache/tvm/pull/18062) - [FFI][REFACTOR] 
Update registry to have complete meta-data
    * [#18059](https://github.com/apache/tvm/pull/18059) - [FFI][REFACTOR] 
Enhance reflection
    * [#18050](https://github.com/apache/tvm/pull/18050) - [FFI] Enhance FFI 
Object exception safety during init
    * [#18121](https://github.com/apache/tvm/pull/18121) - Revert "[FFI] 
Replace `Arg2Str` with a more powerful `for_each`"
    * [#18117](https://github.com/apache/tvm/pull/18117) - [FFI] Replace 
`Arg2Str` with a more powerful `for_each`
    * [#18116](https://github.com/apache/tvm/pull/18116) - [FFI] Use fold 
expression to simplify for_each
    * [#18114](https://github.com/apache/tvm/pull/18114) - [FFI] Replace 
`__attribute__` with C++ standard attributes
    * [#18112](https://github.com/apache/tvm/pull/18112) - [FFI] Cleanup 
visit_attrs attribute after refactor
    * [#18111](https://github.com/apache/tvm/pull/18111) - [FFI] Introduce 
GlobalDef for function registration
    * [#18106](https://github.com/apache/tvm/pull/18106) - [REFACTOR][FFI] 
Phase out old VisitAttrs mechanism
    * [#18042](https://github.com/apache/tvm/pull/18042) - [REFACTOR][FFI] 
Update symbol name for library module
    * [#18023](https://github.com/apache/tvm/pull/18023) - [FFI] More strict 
tuple constructor checking
    * [#18022](https://github.com/apache/tvm/pull/18022) - [REFACTOR][FFI] 
Cleanup PackedFunc redirections
    * [#18020](https://github.com/apache/tvm/pull/18020) - [REFACTOR][PYTHON] 
Phase out tvm.\_ffi and Limited API support
    * [#17979](https://github.com/apache/tvm/pull/17979) - [FFI][REFACTOR] 
Update to distinguish as and cast
    * [#17983](https://github.com/apache/tvm/pull/17983) - [FFI][JVM] Upgrade 
tvm4j to latest FFI
    * [#18010](https://github.com/apache/tvm/pull/18010) - [REFACTOR][FFI] 
Phase out legacy C API
    * [#17943](https://github.com/apache/tvm/pull/17943) - [FFI] Variant 
specialize for all ObjectRef
    * [#17939](https://github.com/apache/tvm/pull/17939) - [REFACTOR] Phase out 
legacy rust ffi
    * [#17940](https://github.com/apache/tvm/pull/17940) - [REFACTOR] Phase out 
legacy go ffi
    * [#17931](https://github.com/apache/tvm/pull/17931) - [REFACTOR][FFI][RPC] 
Migrate RPC to use the latest FFI ABI
    * [#17929](https://github.com/apache/tvm/pull/17929) - [REFACTOR][FFI] 
Cleanup container redirections
    * [#17927](https://github.com/apache/tvm/pull/17927) - [FFI][FEAT] 
AutoDLPack for taking external tensor objects
    * [#17923](https://github.com/apache/tvm/pull/17923) - [REFACTOR][FFI] 
Cleanup PackedFunc related redirection
    * [#17920](https://github.com/apache/tvm/pull/17920) - [REFACTOR] Introduce 
and modernize ffi system
   
   ### web
    * [#17946](https://github.com/apache/tvm/pull/17946) - 
[REFACTOR][FFI]Upgrade Web Runtime to new FFI
    * [#17917](https://github.com/apache/tvm/pull/17917) - [WebGPU][CodeGen] 
Override PrintVecElemLoad and Store for WebGPU
   
   ### Misc
    * [#18104](https://github.com/apache/tvm/pull/18104) - Add LLVM 
Legalization for tir.erf
    * [#18107](https://github.com/apache/tvm/pull/18107) - fix: guard tensormap 
with cuda version check
    * [#18101](https://github.com/apache/tvm/pull/18101) - [REFACTOR] Formalize 
namespace for all objects
    * [#18040](https://github.com/apache/tvm/pull/18040) - Add support for 
bucketize
    * [#18098](https://github.com/apache/tvm/pull/18098) - [REFACTOR] 
Transition VisitAttrs to new reflection mechanism
    * [#18096](https://github.com/apache/tvm/pull/18096) - [REFACTOR] 
Transition VisitAttrs to new reflection mechanism in 
tir/ir_builder/meta_schedule
    * [#18093](https://github.com/apache/tvm/pull/18093) - [NVSHMEM] Extend 
CUDA backend to compile and link TIR modules with NVSHMEM
    * [#18088](https://github.com/apache/tvm/pull/18088) - [Script] Enhance 
alloc buffer handling in nested frames
    * [#18086](https://github.com/apache/tvm/pull/18086) - [SCRIPT] Bump Python 
minimum version to 3.9 and update AST compatibility
    * [#18075](https://github.com/apache/tvm/pull/18075) - add support for 
softsign op
    * [#18079](https://github.com/apache/tvm/pull/18079) - [Script] Add support 
for merging block annotations
    * [#18080](https://github.com/apache/tvm/pull/18080) - [REFACTOR] Phase out 
LegacyReprPrinter and improve CommonSubExprElim
    * [#18078](https://github.com/apache/tvm/pull/18078) - [REFACTOR] Phase out 
the RelaxExpr.checked_type in favor of struct_info
    * [#18073](https://github.com/apache/tvm/pull/18073) - [NVSHMEM] Update 
NDArray allocation
    * [#18066](https://github.com/apache/tvm/pull/18066) - [Script] Remove 
deprecated attributes from Constant AST node
    * [#18060](https://github.com/apache/tvm/pull/18060) - Add Python functor 
support for TIR expressions and statements
    * [#18054](https://github.com/apache/tvm/pull/18054) - [Pytest] Remove 
obsolete test suite entries
    * [#18036](https://github.com/apache/tvm/pull/18036) - Add support for 
hamming_window op
    * [#18049](https://github.com/apache/tvm/pull/18049) - [Refactor] Rename 
`relax_vm` to `vm`
    * [#18046](https://github.com/apache/tvm/pull/18046) - [3rdparty] Phasing 
out FlashInfer AOT from 3rdparty
    * [#18047](https://github.com/apache/tvm/pull/18047) - [Backend] JIT 
compile FlashInfer kernel with FFI header
    * [#18041](https://github.com/apache/tvm/pull/18041) - [DTYPE] Fix dtype 
functions after dtype refactor
    * [#18043](https://github.com/apache/tvm/pull/18043) - [REFACTOR] Phase out 
the relax tuning_api
    * [#18038](https://github.com/apache/tvm/pull/18038) - Resolving 
inconsistency between attention/attention_bias
    * [#18027](https://github.com/apache/tvm/pull/18027) - [Dtype] 
Low-precision Blackwell Datatype Support
    * [#17985](https://github.com/apache/tvm/pull/17985) - [Codegen] Resolve 
issue #17965 where the same model produces different outputs on the LLVM (CPU) 
and CUDA (GPU) backends
    * [#17978](https://github.com/apache/tvm/pull/17978) - Fix IR generation 
conflict in topi.nn.simplify by separating Tensor and PrimExpr handling
    * [#18026](https://github.com/apache/tvm/pull/18026) - [Python] Fix library 
lookup path for pip installed packages
    * [#18019](https://github.com/apache/tvm/pull/18019) - Add op support for 
slice_scatter
    * [#17974](https://github.com/apache/tvm/pull/17974) - Fix FLOP estimation 
for EvaluateNode by implementing VisitStmt_ handler
    * [#18013](https://github.com/apache/tvm/pull/18013) - Fix RuntimeError: 
parallel_for_dynamic
    * [#18014](https://github.com/apache/tvm/pull/18014) - Fix division 
truncation in window size calculation for small dtypes in average_pool
    * [#17995](https://github.com/apache/tvm/pull/17995) - Fix zero-extent 
loops in PerStoreFeature to prevent crashes
    * [#17969](https://github.com/apache/tvm/pull/17969) - Add registion for 
the operator asinh, acosh, atanh in llvm
    * [#17972](https://github.com/apache/tvm/pull/17972) - Fix g.costs
    * [#17953](https://github.com/apache/tvm/pull/17953) - Fix sqrt/rsqrt 
Compatibility with Integer Data Types
    * [#17961](https://github.com/apache/tvm/pull/17961) - Fix basic FLOP 
estimation for WhileNode
    * [#17945](https://github.com/apache/tvm/pull/17945) - Add registion for 
the operator asin and acos in llvm
    * [#17951](https://github.com/apache/tvm/pull/17951) - [NODE] Fix 
structural equality for Array<Any> specialization
    * [#17913](https://github.com/apache/tvm/pull/17913) - [Triton] Support 
latest `triton.compile` interface
    * [#17911](https://github.com/apache/tvm/pull/17911) - Add op support for 
new_zeros op in Exported Program and fx graph frontend
    * [#17909](https://github.com/apache/tvm/pull/17909) - Add 
masked_fill_.scalar, logical_not.default in Exported Program frontend
    * [#17910](https://github.com/apache/tvm/pull/17910) - [RPC] Fix Bug That 
Change Dict When Iterate The Keys
    * [#17896](https://github.com/apache/tvm/pull/17896) - Add op support for 
zeros_like and fill_
    * [#17900](https://github.com/apache/tvm/pull/17900) - Fix onnx expand op
    * [#17865](https://github.com/apache/tvm/pull/17865) - Add support for 
index_put_ op
    * [#17839](https://github.com/apache/tvm/pull/17839) - Add op support for 
roll op
    * [#17844](https://github.com/apache/tvm/pull/17844) - Fix incorrect 
docstring in topi softmax 
    * [#17831](https://github.com/apache/tvm/pull/17831) - [3rdparty] Bump 
DLPack to v1.1 for float8/6/4 dtype supports
    * [#17848](https://github.com/apache/tvm/pull/17848) - Fix docstring in 
batch_to_space_nd and bitpack
    * [#17845](https://github.com/apache/tvm/pull/17845) - fixing incorrect 
docstring in upsampling.py
    * [#17808](https://github.com/apache/tvm/pull/17808) - [Install] Fix error 
during python/tvm installation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Release] v0.21.0 Release Candidate Notes [tvm]

Reply via email to