ysh329 created an issue (apache/tvm#18391)

# Introduction

The TVM community has worked since the last release to deliver the following 
new exciting improvements!

The main tags are below (**bold text is with lots of progress**): Relax 
(especial PyTorch frontend), FFI etc.

Please visit the full listing of commits for a complete view: 
[v0.21.dev0...v0.21.0.rc0](https://github.com/apache/tvm/compare/v0.22.dev0...v0.22.0.rc0).

### Community

None.

### RFCs

None.

### BugFix
 * [#18352](https://github.com/apache/tvm/pull/18352) - [Fix] Update ShapeView 
use in nccl.cc
 * [#18324](https://github.com/apache/tvm/pull/18324) - Fixing binding for bert
 * [#18296](https://github.com/apache/tvm/pull/18296) - [Fix] Add libxml2 
dependency to fix Windows CI build failure
 * [#18294](https://github.com/apache/tvm/pull/18294) - [Fix] Set DRefObj and 
CUDAIPCMemoryObj as mutable
 * [#18285](https://github.com/apache/tvm/pull/18285) - [FFI]Enable 
`load_inline` on macos
 * [#18287](https://github.com/apache/tvm/pull/18287) - [Hotfix] Fix the 
conflicts about ffi-related updated names
 * [#18281](https://github.com/apache/tvm/pull/18281) - [FFI]Fix bug of 
`ffi.cpp.load_inline` on Windows
 * [#18262](https://github.com/apache/tvm/pull/18262) - [NNAPI] Use kind() 
instead of type_key() after FFI refactor
 * [#18244](https://github.com/apache/tvm/pull/18244) - [Fix] Update FlashInfer 
JIT header lookup
 * [#18237](https://github.com/apache/tvm/pull/18237) - [FFI]Fix type_traits on 
DataType after SmallStr update
 * [#18232](https://github.com/apache/tvm/pull/18232) - [LLVM][Fix] Do not emit 
debuginfo on vscale or other unknown types
 * [#18219](https://github.com/apache/tvm/pull/18219) - [Fix] Resolve deadlock 
in PopenPoolExecutor and LocalBuilder
 * [#18207](https://github.com/apache/tvm/pull/18207) - [Fix][ONNX] No 
precision widening for numpy binary operations
 * [#18209](https://github.com/apache/tvm/pull/18209) - [ONNX][FRONTEND][Fix] 
Update Resize to accept ShapeExpr
 * [#18210](https://github.com/apache/tvm/pull/18210) - [Bug] Fix core dump in 
InferLayoutRMSNorm and fix typo
 * [#18208](https://github.com/apache/tvm/pull/18208) - [FFI][Fix] Update 
datatype registry calls to the new paths
 * [#18190](https://github.com/apache/tvm/pull/18190) - [Fix] Codegen fix for 
relax cutlass
 * [#18170](https://github.com/apache/tvm/pull/18170) - [Fix] Fix the wrong 
check for tuple node in #18163
 * [#18174](https://github.com/apache/tvm/pull/18174) - [Misc]Fix missing 
PadAttrs register in op_attrs.py
 * [#18158](https://github.com/apache/tvm/pull/18158) - Fix NCCL build with 
GlobalDef registration
 * [#18140](https://github.com/apache/tvm/pull/18140) - [NNAPI] Fix type 
mismatch and test_mean annotation
 * [#18138](https://github.com/apache/tvm/pull/18138) - [Fix][ONNX] Fixed 
constant ROI handling in resize2d when loading onnx models
 * [#18137](https://github.com/apache/tvm/pull/18137) - [Fix][ONNX] Fix CumSum 
conversion when loading ONNX model

### CI
 * [#18245](https://github.com/apache/tvm/pull/18245) - [LLVM][MSWIN]Fix LLVM 
module build with latest CI update
 * [#18227](https://github.com/apache/tvm/pull/18227) - Exit the build for 
AbortException
 * [#18145](https://github.com/apache/tvm/pull/18145) - [Test] Use roi_list 
variable instead of hardcoded values in ROI tensor creation

### Docs
 * [#18279](https://github.com/apache/tvm/pull/18279) - [FFI]Initial bringup of 
cpp docs
 * [#18264](https://github.com/apache/tvm/pull/18264) - Misc docs fix
 * [#18263](https://github.com/apache/tvm/pull/18263) - [FFI]Initial docs 
scaffolding
 * [#18261](https://github.com/apache/tvm/pull/18261) - [FFI]Add missing files 
in packaging example
 * [#18256](https://github.com/apache/tvm/pull/18256) - [FFI]Wheel Packaging
 * [#18128](https://github.com/apache/tvm/pull/18128) - [Doc] Visualize the 
architecture using a UML sequence diagram

### Frontend
 * [#18143](https://github.com/apache/tvm/pull/18143) - [ONNX] Extend axes for 
layer_norm when gamma/beta are multi-dimensional

### LLVM
 * [#18204](https://github.com/apache/tvm/pull/18204) - Fixes up to the latest 
LLVM21
 * [#18202](https://github.com/apache/tvm/pull/18202) - [CPPTEST] Small fixes 
for LLVM >= 20

### MetaSchedule
 * [#18243](https://github.com/apache/tvm/pull/18243) - [LLVM]Add RISCV 
V-extension v1.0 kernels to metaschedule

### Metal
 * [#18290](https://github.com/apache/tvm/pull/18290) - Fix MetalModuleCreate
 * [#18283](https://github.com/apache/tvm/pull/18283) - [Fix]Fix type for 
device array in Metal API

### ROCm
 * [#18225](https://github.com/apache/tvm/pull/18225) - Minor fixes for latest 
refactor

### Relax
 * [#18374](https://github.com/apache/tvm/pull/18374) - [PyTorch] improve the 
check for no bias situation
 * [#18358](https://github.com/apache/tvm/pull/18358) - [Frontend][ONNX] Fix 
`FastGelu` when bias does not set
 * [#18360](https://github.com/apache/tvm/pull/18360) - [PyTorch] Support gru 
op for ExportedProgram importer
 * [#18359](https://github.com/apache/tvm/pull/18359) - [PyTorch] Fix the 
segfault in from_exported_program when model returns (Tensor, None) tuple
 * [#18321](https://github.com/apache/tvm/pull/18321) - [ONNX] Support 
AllClassNMS Operator for ONNX Frontend
 * [#18346](https://github.com/apache/tvm/pull/18346) - [PyTorch] Support lstm 
op for ExportedProgram importer
 * [#18351](https://github.com/apache/tvm/pull/18351) - [Frontend][Torch] Fix 
parsing error when input dimension of unbind is 1
 * [#18331](https://github.com/apache/tvm/pull/18331) - Update BasePyModule 
with faster DLPack converter for tensor conversion
 * [#18343](https://github.com/apache/tvm/pull/18343) - [PyTorch] Support 
MatrixMultiply op for ExportedProgram importer
 * [#18336](https://github.com/apache/tvm/pull/18336) - Operator and RoPE 
support for Llama4
 * [#18329](https://github.com/apache/tvm/pull/18329) - [Frontend][ONNX] Error 
converting operator Expand: TVMError: broadcast_to expects the input tensor 
shape is broadcastable to the target shape
 * [#18326](https://github.com/apache/tvm/pull/18326) - [Backend] Implement 
R.call_py_func operator for calling Python functions from compiled TVM
 * [#18313](https://github.com/apache/tvm/pull/18313) - Introduce 
R.call_py_func operator for calling Python functions from Relax IR
 * [#18301](https://github.com/apache/tvm/pull/18301) - Fix 
RelaxToPyFuncConverter compatibility and improve fallback handling
 * [#18288](https://github.com/apache/tvm/pull/18288) - Add symbolic shape 
support to BasePyModule for dynamic tensor operations
 * [#18269](https://github.com/apache/tvm/pull/18269) - Add Relax to Python 
Function Converter
 * [#18253](https://github.com/apache/tvm/pull/18253) - Building TVMScript 
printer for IRModules with Python functions
 * [#18229](https://github.com/apache/tvm/pull/18229) - Add Python function 
support and BasePyModule for PyTorch integration
 * [#18242](https://github.com/apache/tvm/pull/18242) - ONNX frontend using 
relax softplus operator
 * [#18180](https://github.com/apache/tvm/pull/18180) - [ONNX] Parse ONNX 
Upsample to Relax resize2d
 * [#18179](https://github.com/apache/tvm/pull/18179) - Support Relax Operator 
PReLU
 * [#18163](https://github.com/apache/tvm/pull/18163) - Fix issue in fuse 
concat ops by pattern
 * [#18120](https://github.com/apache/tvm/pull/18120) - [Fix]Fix potential 
out-of-bounds access in `TupleRewriterNode`
 * [#18061](https://github.com/apache/tvm/pull/18061) - [ONNX][Transform] Add 
mode choice, new mode, and warning for take()
 * [#18122](https://github.com/apache/tvm/pull/18122) - [KVCache] Fix kernel 
dispatch based on attention kinds

### TIR
 * [#18319](https://github.com/apache/tvm/pull/18319) - Refactor division 
simplification in RewriteSimplifier
 * [#18341](https://github.com/apache/tvm/pull/18341) - Support sequence 
comparisons in TVMScript
 * [#18323](https://github.com/apache/tvm/pull/18323) - Add support for 
conditional expressions in TVMScript
 * [#18199](https://github.com/apache/tvm/pull/18199) - Fix host/device 
function check for build
 * [#18154](https://github.com/apache/tvm/pull/18154) - Fix trivial index map 
[] -> [0]
 * [#18151](https://github.com/apache/tvm/pull/18151) - Decouple DeepEqual from 
StructuralEqual
 * [#18134](https://github.com/apache/tvm/pull/18134) - Add `T.thread_return()` 
for early thread exit in CUDA kernels

### TVMScript
 * [#17804](https://github.com/apache/tvm/pull/17804) - Support continue and 
break in tvmscript

### cuda & cutlass & tensorrt
 * [#18353](https://github.com/apache/tvm/pull/18353) - [CUDA] Update 
FlashInfer JIT integration
 * [#18320](https://github.com/apache/tvm/pull/18320) - [TIR][CUDA] Preserve 
float precision in codegen with hexfloat output
 * [#18300](https://github.com/apache/tvm/pull/18300) - [CUDA] Support NVTX in 
CUDA 13
 * [#18238](https://github.com/apache/tvm/pull/18238) - [CUTLASS] Fix CUTLASS 
kernel compilation
 * [#18144](https://github.com/apache/tvm/pull/18144) - [CodeGen][CUDA] Add 
sinhf CUDA Math API for CodeGen

### web
 * [#18327](https://github.com/apache/tvm/pull/18327) - [CMake]Install `web/` 
directory in cmake for Python package
 * [#18168](https://github.com/apache/tvm/pull/18168) - Fix incompatible part 
after FFI updates

### Misc
 * [#18376](https://github.com/apache/tvm/pull/18376) - [FFI] Bump tvm-ffi to 
0.1.0rc2
 * [#18330](https://github.com/apache/tvm/pull/18330) - [Analyzer] Enhance 
ConstIntBoundAnalyzer and IntervalSet with modular set analysis
 * [#18372](https://github.com/apache/tvm/pull/18372) - Upgrade to CUTLASS 4.2.1
 * [#18375](https://github.com/apache/tvm/pull/18375) - [TE] [FFI] Fix broken 
axis/reduce_axis properties in BaseComputeOp and ScanOp after FFI refactoring
 * [#18370](https://github.com/apache/tvm/pull/18370) - [FFI] Bump tvm-ffi 
dependency
 * [#18354](https://github.com/apache/tvm/pull/18354) - [FFI][ABI] Bump tvm-ffi 
to latest
 * [#18349](https://github.com/apache/tvm/pull/18349) - [FFI][ABI] Bump tvm-ffi 
to latest
 * [#18348](https://github.com/apache/tvm/pull/18348) - [Python] Add library 
lookup path for tvm installed as a pakcage
 * [#18345](https://github.com/apache/tvm/pull/18345) - [FFI][ABI] Bump tvm-ffi 
version to reflect RC ABI Update
 * [#18332](https://github.com/apache/tvm/pull/18332) - [FFI][ABI] Bump version 
ffi to latest
 * [#18334](https://github.com/apache/tvm/pull/18334) - Fix conflict parameter 
name promote_dtye in FP8ComputeLegalize
 * [#18325](https://github.com/apache/tvm/pull/18325) - [flashinfer] Support 
directing JIT to FlashInfer GroupedGemm kernels
 * [#18328](https://github.com/apache/tvm/pull/18328) - Fixing datatype error 
for gpt-2
 * [#18318](https://github.com/apache/tvm/pull/18318) - [3rdparty] Remove 
dlpack/libbacktrace from 3rdparty
 * [#18317](https://github.com/apache/tvm/pull/18317) - [FlashInfer] Update 
include path and interface
 * [#18314](https://github.com/apache/tvm/pull/18314) - [REFACTOR][FFI] Split 
tvm-ffi into a separate repo
 * [#18312](https://github.com/apache/tvm/pull/18312) - [FFI][REFACTOR] Update 
TVM_FFI_STATIC_INIT_BLOCK to fn style
 * [#18311](https://github.com/apache/tvm/pull/18311) - [FFI][ABI] Better 
String and Nested Container handling
 * [#18308](https://github.com/apache/tvm/pull/18308) - [FFI][ABI] Refactor the 
naming of DLPack speed converter
 * [#18307](https://github.com/apache/tvm/pull/18307) - [FFI] Update 
`load_inline` interface
 * [#18306](https://github.com/apache/tvm/pull/18306) - [FFI][ABI][REFACTOR] 
Enhance DLPack Exchange Speed and Behavior
 * [#18304](https://github.com/apache/tvm/pull/18304) - Clear ext_lib_dll_names 
for macOS platform
 * [#18302](https://github.com/apache/tvm/pull/18302) - [FFI][REFACTOR] 
Refactor python ffi call mechanism for perf
 * [#18299](https://github.com/apache/tvm/pull/18299) - [Python] Fix runtime 
tensor import
 * [#18298](https://github.com/apache/tvm/pull/18298) - [FFI] Fix system 
library symbol lookup
 * [#18297](https://github.com/apache/tvm/pull/18297) - [FFI] Temp skip windows 
tests
 * [#18295](https://github.com/apache/tvm/pull/18295) - [FFI][ABI] Introduce 
generic stream exchange protocol
 * [#18289](https://github.com/apache/tvm/pull/18289) - [FFI][REFACTOR] 
Streamline Object Declare Macros
 * [#18291](https://github.com/apache/tvm/pull/18291) - [3rdparty] Bump 
cutlass_fpA_intB_gemm to fix SM90 build
 * [#18284](https://github.com/apache/tvm/pull/18284) - [FFI][REFACTOR] 
Introduce UnsafeInit and enhance ObjectRef null safety
 * [#18282](https://github.com/apache/tvm/pull/18282) - [FFI] Relax default 
alignment and continguous requirement
 * [#18280](https://github.com/apache/tvm/pull/18280) - [FFI][REFACTOR] Cleanup 
namespace
 * [#18278](https://github.com/apache/tvm/pull/18278) - [FFI] Temp skip 
load_inline tests nonlinux
 * [#18277](https://github.com/apache/tvm/pull/18277) - [FFI][REFACTOR] Cleanup 
tvm_ffi python API and types
 * [#18276](https://github.com/apache/tvm/pull/18276) - [FFI] Add 
ffi::Tensor.strides()
 * [#18275](https://github.com/apache/tvm/pull/18275) - [FFI][REFACTOR][ABI] 
Rename NDArray to Tensor
 * [#18274](https://github.com/apache/tvm/pull/18274) - [FFI] Update the 
interface of `ffi.load_inline` to match torch
 * [#18273](https://github.com/apache/tvm/pull/18273) - [FFI][ABI] Append 
symbol prefix for ffi exported functions
 * [#18272](https://github.com/apache/tvm/pull/18272) - [FFI] Construct 
NDArray.strides by default
 * [#18271](https://github.com/apache/tvm/pull/18271) - [FFI] Support inline 
module
 * [#18270](https://github.com/apache/tvm/pull/18270) - [FFI] Support Opaque 
PyObject
 * [#18266](https://github.com/apache/tvm/pull/18266) - [FFI] Update torch 
stream getter to use native torch c api
 * [#18252](https://github.com/apache/tvm/pull/18252) - [Build] Complete TVM 
wheel building migration
 * [#18259](https://github.com/apache/tvm/pull/18259) - [FFI][ABI] Introduce 
weak rc support
 * [#18258](https://github.com/apache/tvm/pull/18258) - [FFI] fix two seemingly 
migration issue
 * [#18254](https://github.com/apache/tvm/pull/18254) - [FFI][ABI] ABI Updates 
to for future metadata and complex ordering
 * [#18236](https://github.com/apache/tvm/pull/18236) - upgrade cutlass v4.2.0 
supporting cuda 13
 * [#18251](https://github.com/apache/tvm/pull/18251) - [Python] Complete 
Python packaging with scikit-build-core
 * [#18248](https://github.com/apache/tvm/pull/18248) - [Python] Update 
version.py to bump pyproject.toml automatically
 * [#18249](https://github.com/apache/tvm/pull/18249) - [FFI][CMAKE] Revert 
cmake libbacktrace URL and update submodule
 * [#18239](https://github.com/apache/tvm/pull/18239) - [Build] Migrate Python 
packaging to pyproject.toml with scikit-build-core
 * [#18246](https://github.com/apache/tvm/pull/18246) - [FFI][CMAKE] Add 
missing download path for libbacktrace
 * [#18234](https://github.com/apache/tvm/pull/18234) - [FFI] Misc fixup for 
windows
 * [#18233](https://github.com/apache/tvm/pull/18233) - [FFI] Robustify the 
pyproject setup
 * [#18226](https://github.com/apache/tvm/pull/18226) - [FFI][REFACTOR] 
Establish tvm_ffi python module
 * [#18221](https://github.com/apache/tvm/pull/18221) - [FFI] Fix JSON 
parser/writer for the fast-math flag
 * [#18222](https://github.com/apache/tvm/pull/18222) - [NVSHMEM] Fix 
compatibility with CUDA code without nvshmem use
 * [#18220](https://github.com/apache/tvm/pull/18220) - [Thrust] Fix getting 
CUDA stream
 * [#18218](https://github.com/apache/tvm/pull/18218) - [FFI][REFACTOR] Cleanup 
API locations
 * [#18217](https://github.com/apache/tvm/pull/18217) - [FFI] AudoDLPack 
compatible with torch stream context
 * [#18216](https://github.com/apache/tvm/pull/18216) - [FFI][REFACTOR] 
Establish Stream Context in ffi
 * [#18214](https://github.com/apache/tvm/pull/18214) - [FFI][REFACTOR] 
Establish ffi.Module in python
 * [#18213](https://github.com/apache/tvm/pull/18213) - [FFI] Formalize 
ffi.Module
 * [#18212](https://github.com/apache/tvm/pull/18212) - [FFI] Make JSON 
Parser/Write fastmath safe
 * [#18211](https://github.com/apache/tvm/pull/18211) - [TARGET]add target for 
nvidia rtx 5060ti
 * [#18206](https://github.com/apache/tvm/pull/18206) - [CODEGEN][REFACTOR] 
tir.call_llvm_intrin to remove nargs
 * [#18205](https://github.com/apache/tvm/pull/18205) - [FFI][REFATOR] Cleanup 
entry function to redirect
 * [#18200](https://github.com/apache/tvm/pull/18200) - [FFI][REFACTOR] Update 
Map ABI to enable flexible smallMap switch
 * [#18198](https://github.com/apache/tvm/pull/18198) - [FFI][REFACTOR] Move 
Downcast out of ffi for now
 * [#18197](https://github.com/apache/tvm/pull/18197) - [REFACTOR] Update data 
type rewriter to enable recursive rewrite in Any
 * [#18193](https://github.com/apache/tvm/pull/18193) - Bump 
cutlass_fpA_intB_gemm to latest commit
 * [#18192](https://github.com/apache/tvm/pull/18192) - [FFI] Phase out 
ObjectPath in favor of AccessPath
 * [#18191](https://github.com/apache/tvm/pull/18191) - [FFI][REFACTOR] 
Refactor AccessPath to enable full tree repr
 * [#18189](https://github.com/apache/tvm/pull/18189) - [FFI][REFACTOR] Phase 
out getattr based attribute handling
 * [#18188](https://github.com/apache/tvm/pull/18188) - [FFI][REFACTOR] Migrate 
the Save/Load JSON to the new reflection
 * [#18187](https://github.com/apache/tvm/pull/18187) - [FFI][EXTRA] 
Serialization To/From JSONGraph
 * [#18186](https://github.com/apache/tvm/pull/18186) - [FFI] Lightweight json 
parser/writer
 * [#18185](https://github.com/apache/tvm/pull/18185) - [FFI] Introduce small 
string/bytes
 * [#18184](https://github.com/apache/tvm/pull/18184) - [FFI][REFACTOR] Hide 
StringObj/BytesObj into details
 * [#18183](https://github.com/apache/tvm/pull/18183) - [FFI][REFACTOR] Cleanup 
to align to latest ffi
 * [#18181](https://github.com/apache/tvm/pull/18181) - [REFACTOR] Upgrade 
NestedMsg<T> to use new ffi::Any mechanism
 * [#18178](https://github.com/apache/tvm/pull/18178) - [FFI] Fix SmallMapInit 
with duplicated keys
 * [#18177](https://github.com/apache/tvm/pull/18177) - [FFI][REFACTOR] Isolate 
out extra API
 * [#18176](https://github.com/apache/tvm/pull/18176) - [FFI] Improve string 
equal/hash handling
 * [#18172](https://github.com/apache/tvm/pull/18172) - [REFACTOR][FFI] Phase 
out SEqualReduce/SHashReduce
 * [#18166](https://github.com/apache/tvm/pull/18166) - [FFI][REFACTOR] Migrate 
StructuralEqual/Hash to new reflection
 * [#18165](https://github.com/apache/tvm/pull/18165) - [FFI][REFACTOR] Enable 
custom s_hash/equal
 * [#18160](https://github.com/apache/tvm/pull/18160) - [FFI][REFACTOR] 
Introduce TypeAttr in reflection
 * [#18156](https://github.com/apache/tvm/pull/18156) - [FFI] Structural equal 
and hash based on reflection
 * [#18153](https://github.com/apache/tvm/pull/18153) - Fix Release Package 
Test Script
 * [#18149](https://github.com/apache/tvm/pull/18149) - [FFI] Log and throw in 
function dup registration
 * [#18148](https://github.com/apache/tvm/pull/18148) - [FFI][REFACTOR] Phase 
out TVM_FFI_REGISTER_GLOBAL in favor of GlobalDef
 * [#18147](https://github.com/apache/tvm/pull/18147) - [FFI][REFACTOR] 
Modularize refelection
 * [#18141](https://github.com/apache/tvm/pull/18141) - [FFI][PYTHON] Improve 
the traceback generation in python
 * [#18142](https://github.com/apache/tvm/pull/18142) - [REFACTOR] Migrate 
TVM_FFI_REGISTER_GLOBAL to new reflection style
 * [#18130](https://github.com/apache/tvm/pull/18130) - Fix compilation 
warnings of unnecessary `std::move()` calls
 * [#18129](https://github.com/apache/tvm/pull/18129) - Delete redundant imports
 * [#18055](https://github.com/apache/tvm/pull/18055) - [Target] Support CUDA 
device function calls
 * [#18127](https://github.com/apache/tvm/pull/18127) - Revert "[Refactor] 
Build cython with isolate environment"
 * [#18125](https://github.com/apache/tvm/pull/18125) - Phase out StackVM 
runtime support
 * [#18124](https://github.com/apache/tvm/pull/18124) - [Refactor] Build cython 
with isolate environment
 * [#18123](https://github.com/apache/tvm/pull/18123) - [Codegen] Update LLVM 
version requirement for `insertDeclare`

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/18391
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm/issues/[email protected]>

Reply via email to