[I] [Release] v0.16.0 Release Candidate Notes [tvm]

via GitHub Sun, 21 Apr 2024 05:02:02 -0700


ysh329 opened a new issue, #16911:
URL: https://github.com/apache/tvm/issues/16911


   # Introduction
   
   The TVM community has worked since the v0.15.0 release to deliver the 
following new exciting improvements! This release version is:
   
   - **First support of Relax**, with dynamic shape and pipeline
   - Dlight module for optimizing LLM TIR workloads on GPU
   - Disco module for initial SPMD multi-GPU support
   
   The main tags are below (**bold text is with lots of progress**):
   
   - Community, RFCs
   - Adreno, ArmComputeLibrary, Metal, cuda & cutlass & tensorrt, micoNPU, 
Runtime
   - **Relax**, **Dlight**, **Disco**
   - Arith, **TIR**, TVMScript
   - Docs, CI, **Misc**, **BugFix**
   
   Please visit the full listing of commits for a complete view: 
[v0.16.dev0...v0.16.0.rc0](https://github.com/apache/tvm/compare/v0.16.dev0...v0.16.0.rc0).
   
   ### Community
   
    * [#16695](https://github.com/apache/tvm/pull/16695) - Add new key for 
release signing
    * [#16419](https://github.com/apache/tvm/pull/16419) - Add new key for 
release signing
   
    ### RFCs
   
   This new RFC explores how TVM can be utilized to generate code for the SME 
ISA to achieve improved inference performance on supported Arm®-based hardware 
implementing the SME extension.
   
    * [#107](https://github.com/apache/tvm-rfcs/pull/107) - [RFC] Scalable 
Matrix Extension enablement
   ----
   
   ### Arith
    * [#16735](https://github.com/apache/tvm/pull/16735) - [Fixup] Require 
feature flag for tighter inequality bounds
    * [#16588](https://github.com/apache/tvm/pull/16588) - Provide tighter 
ConstIntBounds for special cases
    * [#16704](https://github.com/apache/tvm/pull/16704) - [Fix]Fix canonical 
simplification of LE
   
   ### BYOC
    * [#16567](https://github.com/apache/tvm/pull/16567) - Skip processed 
functions in FuseOpsByPattern and RunCodegen
   
   ### BugFix
    * [#16766](https://github.com/apache/tvm/pull/16766) - [Target] Added null 
check to fix segfault at ->defined() in cpu.cc DetectSystemTriple()
    * [#16739](https://github.com/apache/tvm/pull/16739) - [Ansor] Fixing Ansor 
Gradient Bug
    * [#16820](https://github.com/apache/tvm/pull/16820) - [Fix] PAPI docs
    * [#16793](https://github.com/apache/tvm/pull/16793) - [Fix] fix for numpy 
2.0 compatibility
    * [#16790](https://github.com/apache/tvm/pull/16790) - [Fix] Fix build 
errors with VS2022
    * [#16780](https://github.com/apache/tvm/pull/16780) - [Fix] Fix numpy 
dtype map
    * [#16773](https://github.com/apache/tvm/pull/16773) - [Fix] Fix the purity 
flag of "vm.call_tir_dyn" and "kill" ops
    * [#16770](https://github.com/apache/tvm/pull/16770) - [Hotfix] Revert 
driver API pass ordering that breaks MLC, mark failing test
    * [#16771](https://github.com/apache/tvm/pull/16771) - [Fix] Remove 
redundant "remove_all_unused" in IPC memory lowering
    * [#16746](https://github.com/apache/tvm/pull/16746) - [Fix][Builtin] Fix 
"GetQueryPosition" of PagedKVCache
    * [#16728](https://github.com/apache/tvm/pull/16728) - [Fix] Introduce 
TVM_DEBUG_WITH_ABI_CHANGE to warn ABI changes in debug mode
    * [#16714](https://github.com/apache/tvm/pull/16714) - [Fix] PagedKVCache 
fetching compute stream when copy stream is needed
    * [#16684](https://github.com/apache/tvm/pull/16684) - [SLM] Produce 
well-formed Relax for nn.modules.KVCache
    * [#16659](https://github.com/apache/tvm/pull/16659) - add the default 
value for DFT in ONNX frontend
    * [#16637](https://github.com/apache/tvm/pull/16637) - [Transform] Preserve 
symbolic variables in FuseOps
    * [#16649](https://github.com/apache/tvm/pull/16649) - [FFI] Add a missing 
default for datatype lanes
    * [#16492](https://github.com/apache/tvm/pull/16492) - [Executor] fix 
debug_executor function debug_get_output
    * [#16598](https://github.com/apache/tvm/pull/16598) - [Transform]Handle 
non-composite lambda functions in FuseOps
    * [#16565](https://github.com/apache/tvm/pull/16565) - [Transform] Keep 
private non-primitive functions in FuseTIR
    * [#16518](https://github.com/apache/tvm/pull/16518) - Use x*x*x instead of 
pow(x,3)
    * [#16436](https://github.com/apache/tvm/pull/16436) - Ensure that bf16 
arrays are created as expected
    * [#16361](https://github.com/apache/tvm/pull/16361) - Disable 
SingleEnvThreadVerifier
    * [#16289](https://github.com/apache/tvm/pull/16289) - [AUTOTVM][FIX] Typo 
fixes and add a warning in the Droplet Search
   
   ### CI
    * [#16837](https://github.com/apache/tvm/pull/16837) - Disable flaky unit 
test
    * [#16765](https://github.com/apache/tvm/pull/16765) - [AOT][Testing] 
Improve output mismatch information on test failure
    * [#16661](https://github.com/apache/tvm/pull/16661) - add merge_with_main 
in unity
    * [#16611](https://github.com/apache/tvm/pull/16611) - [AOT][Testing] Print 
output values on test failure
    * [#16546](https://github.com/apache/tvm/pull/16546) - Disable testing that 
downloads from mxnet
    * [#16521](https://github.com/apache/tvm/pull/16521) - Fix CI Script and 
Broken Tests
    * [#16502](https://github.com/apache/tvm/pull/16502) - Support tvm-bot 
rerun for tvm-unity task
    * [#16435](https://github.com/apache/tvm/pull/16435) - Update image tag to 
20240126-070121-8ade9c30e
    * [#16420](https://github.com/apache/tvm/pull/16420) - [WASM] Update emsdk 
and nodejs version
    * [#16384](https://github.com/apache/tvm/pull/16384) - Remove 
NVIDIA_DISABLE_REQUIRE
    * [#16382](https://github.com/apache/tvm/pull/16382) - In 
jenkins.cmd_utils.Sh.tee, check for failing subprocess
    * [#16366](https://github.com/apache/tvm/pull/16366) - Upgrade sccache 
version to 0.7.*
    * [#16369](https://github.com/apache/tvm/pull/16369) - Upgrade Unity ci 
images
    * [#16344](https://github.com/apache/tvm/pull/16344) - Update docker images 
tag to 20240105-165030-51bdaec6
    * [#16340](https://github.com/apache/tvm/pull/16340) - [Unity][UnitTest] 
Increase atol to resolve flaky CI failure
    * [#16337](https://github.com/apache/tvm/pull/16337) - [Hexagon][UnitTest] 
Disable flaky quantization test
    * [#16336](https://github.com/apache/tvm/pull/16336) - Upgrade cmake 
version to 3.24.0
   
   ### Docker
    * [#16755](https://github.com/apache/tvm/pull/16755) - [SME]Add Fixed 
Virtual Platform (FVP) and toolchain install
    * [#16348](https://github.com/apache/tvm/pull/16348) - Upgrade pip in i386 
container
   
   ### Dlight
    * [#16775](https://github.com/apache/tvm/pull/16775) - [Fix][Dlight] 
(Low-batched-)GeMV on small spatial loops
    * [#16429](https://github.com/apache/tvm/pull/16429) - [Unity][Dlight][Fix] 
Reduction rule support dyn-shape epilogue
    * [#16351](https://github.com/apache/tvm/pull/16351) - [Unity] Add 
dlight.gpu.Fallback in DispatchSortScan, add argsort, topk, and cumprod
    * [#16338](https://github.com/apache/tvm/pull/16338) - [Unity][DLight] 
Introduce Specific Rule for RMSNorm
    * [#16251](https://github.com/apache/tvm/pull/16251) - [Unity][Dlight] 
Support dlight gemv rule on nested inner block
    * [#16878](https://github.com/apache/tvm/pull/16878) - [Dlight] Enhance 
vectorization loading weight for gemv
    * [#16848](https://github.com/apache/tvm/pull/16848) - [DLight] Fix a 
corner case for reduction rule
    * [#16701](https://github.com/apache/tvm/pull/16701) - [Dlight] Add 
fallback for low batch gemv with outer reduction
    * [#16678](https://github.com/apache/tvm/pull/16678) - [Dlight] 
LowBatchGemv rule only apply to function with spatial symbolic var
    * [#16665](https://github.com/apache/tvm/pull/16665) - [Dlight] Skip GeMV 
when normalization fails
    * [#16579](https://github.com/apache/tvm/pull/16579) - [Dlight] Scheduling 
Low batch GEMM using GEMV-like rule
    * [#16579](https://github.com/apache/tvm/pull/16579) - [Dlight] Scheduling 
Low batch GEMM using GEMV-like rule
    * [#16321](https://github.com/apache/tvm/pull/16321) - [DLight] Skip rule 
if target is not suitable
    * [#16731](https://github.com/apache/tvm/pull/16731) - [Dlight] Fix GeMV 
shared memory estimation
   
   ### Docs
    * [#16792](https://github.com/apache/tvm/pull/16792) - [Doc] Fix 
set_axis_separator example
    * [#16610](https://github.com/apache/tvm/pull/16610) - [Doc] Fixed 
Docstring usage example in `tvm.ir.make_node`
    * [#16572](https://github.com/apache/tvm/pull/16572) - [Doc] Remove MxNet 
related tutorials
    * [#16514](https://github.com/apache/tvm/pull/16514) - [Unity][Doc] 
Document passes that depend on `DataflowBlock`s and encourage using 
`ConvertToDataflow`
    * [#16482](https://github.com/apache/tvm/pull/16482) - [Doc] Fix Docstring 
in `extern.py` for Sphinx
    * [#16346](https://github.com/apache/tvm/pull/16346) - [Doc] Fix minor 
error in "Expressions in Relay"
   
   ### Frontend
    * [#16001](https://github.com/apache/tvm/pull/16001) - [ONNX] Fix 
interpreting auto_pad parameters in ConvTranspose operator
    * [#16651](https://github.com/apache/tvm/pull/16651) - [PaddlePaddle] 
PaddlePaddle model with NCHW data format that supports quantization
    * [#16616](https://github.com/apache/tvm/pull/16616) - [PaddlePaddle] 
Support conv2d when data_format is NHWC
    * [#16526](https://github.com/apache/tvm/pull/16526) - [Keras] Enable Dense 
operator for any input dims
    * [#16478](https://github.com/apache/tvm/pull/16478) - [PaddlePaddle] Fixed 
the bug that prevented the model from being successfully converted to microTVM 
on MacOS
   
   ### Hexagon
    * [#16762](https://github.com/apache/tvm/pull/16762) - [VM]Cache operations 
when bypass mode is enabled
    * [#16706](https://github.com/apache/tvm/pull/16706) - [VM] Add buffers to 
`dma_wait` builtin
    * [#16448](https://github.com/apache/tvm/pull/16448) - [VM]Implement 
dma_copy and dma_wait builtin for hexagon
   
   ### LLVM
    * [#16782](https://github.com/apache/tvm/pull/16782) - [SVE] Support 
scalable vectors in LoopVectorizer
    * [#16812](https://github.com/apache/tvm/pull/16812) - Fix compilation 
failure due to minor change
    * [#16808](https://github.com/apache/tvm/pull/16808) - [Runtime]Fix errors 
during loading of target tags
    * [#16748](https://github.com/apache/tvm/pull/16748) - Lack of DWARF type 
is not an error
    * [#16696](https://github.com/apache/tvm/pull/16696) - [SVE] Add codegen 
support for scalable buffer accesses
    * [#15964](https://github.com/apache/tvm/pull/15964) - [RUNTIME] Add 
optional LLVM ORCJIT runtime executor
    * [#16612](https://github.com/apache/tvm/pull/16612) - [SVE] Add support 
for scalable data type strings
    * [#16523](https://github.com/apache/tvm/pull/16523) - [SVE] Change the 
dtype of Ramp and Broadcast lanes to PrimExpr
    * [#16484](https://github.com/apache/tvm/pull/16484) - [SVE] Add vscale 
builtin
    * [#16373](https://github.com/apache/tvm/pull/16373) - Update Host.h path
   
   ### MetaSchedule
    * [#16725](https://github.com/apache/tvm/pull/16725) - Make the `opt_level` 
of `tune_relay()` adjustable
   
   ### Metal
    * [#16713](https://github.com/apache/tvm/pull/16713) - [RUNTIME]Provide 
richer runtime when error happens
    * [#16605](https://github.com/apache/tvm/pull/16605) - [RUNTIME]Fix 
multithreading access of metal runtime
    * [#16438](https://github.com/apache/tvm/pull/16438) - Dispatch numerically 
stable tanh for metal
   
   ### OpenCL & CLML
    * [#16854](https://github.com/apache/tvm/pull/16854) - [OpenCL] Add OpenCL 
device for automatic target detection
    * [#16846](https://github.com/apache/tvm/pull/16846) - 
[Meta-Schedule][OpenCL] Enable MS tuning for Android OpenCL
    * [#16768](https://github.com/apache/tvm/pull/16768) - [RUNTIME][OPENCL] 
Bugfix for ciImage create with host ptr
    * [#16672](https://github.com/apache/tvm/pull/16672) - [CLML] Fix build TVM 
with CLML on MacOS
    * [#16328](https://github.com/apache/tvm/pull/16328) - [RUNTIME][CLML] Fix 
for Softmax op for 4D tensors
    * [#16394](https://github.com/apache/tvm/pull/16394) - [OpenCL][CMake] Fix 
OpenCL tests compilation
   
   ### ROCm
    * [#16441](https://github.com/apache/tvm/pull/16441) - [WebGPU] Intrin 
Dispatch: `tanh`, `erf`, `log`
    * [#16404](https://github.com/apache/tvm/pull/16404) - Some fixes of ROCm 
codegen
   
   ### Relax
    * [#16872](https://github.com/apache/tvm/pull/16872) - Enhance symbolic 
expr estimation in memory planning
    * [#16867](https://github.com/apache/tvm/pull/16867) - Dispatch sort/scan 
for non-cuda gpu backends
    * [#16852](https://github.com/apache/tvm/pull/16852) - Fix 
EliminiateCommonSubexpr removing alloc tensor
    * [#16851](https://github.com/apache/tvm/pull/16851) - [Relax,Topi] Allow 
passing workspace to thrust to avoid allocations
    * [#16841](https://github.com/apache/tvm/pull/16841) - Provide well-formed 
output in `transform.LazyGetInput`
    * [#16798](https://github.com/apache/tvm/pull/16798) - [Transform] Provide 
callback versions of LazyTransformParams
    * [#16801](https://github.com/apache/tvm/pull/16801) - Allow 
DeadCodeElimination within ApplyPassToFunction
    * [#16834](https://github.com/apache/tvm/pull/16834) - Capture symbolic 
vars in struct info of weights
    * [#16830](https://github.com/apache/tvm/pull/16830) - Share storage allocs 
among functions after cuda graph rewriting
    * [#16823](https://github.com/apache/tvm/pull/16823) - [VM] Refactor CUDA 
graph builtins as VM extension
    * [#16828](https://github.com/apache/tvm/pull/16828) - [Bugfix] Provide the 
full Expr to pattern-match rewriter
    * [#16805](https://github.com/apache/tvm/pull/16805) - [Bugfix]BlockBuilder 
may not assume unique input functions
    * [#16815](https://github.com/apache/tvm/pull/16815) - Enable capturing 
symbolic shapes in cuda graph
    * [#16642](https://github.com/apache/tvm/pull/16642) - Allow R.Prim('bool') 
in relax::If and assert_op
    * [#16796](https://github.com/apache/tvm/pull/16796) - Unit-test for 
structural equal of recursive function
    * [#16732](https://github.com/apache/tvm/pull/16732) - Allow composition of 
DFPattern replacements
    * [#16783](https://github.com/apache/tvm/pull/16783) - Improve 
CanonicalizeBindings in DataflowVar edge case
    * [#16721](https://github.com/apache/tvm/pull/16721) - Implement operators 
to inspec DLTensor::strides and offset
    * [#16730](https://github.com/apache/tvm/pull/16730) - Refactor 
PatternRewriter into separate Block/Expr mutators
    * [#16756](https://github.com/apache/tvm/pull/16756) - [IR]Improve 
highlighting in assert_structural_equal
    * [#16779](https://github.com/apache/tvm/pull/16779) - Improve malform 
error msg
    * [#16569](https://github.com/apache/tvm/pull/16569) - [Unity][Parser] 
Check well-formedness in the parser
    * [#16759](https://github.com/apache/tvm/pull/16759) - [Pass] Lowering 
passes for GPU IPC memory and allreduce
    * [#16697](https://github.com/apache/tvm/pull/16697) - Implement 
relax.transform.TopologicalSort
    * [#16658](https://github.com/apache/tvm/pull/16658) - Normalize use of 
void-type variable to inline R.tuple()
    * [#16711](https://github.com/apache/tvm/pull/16711) - [Frontend] Add op 
`tanh`, `exp`, `negative`, and `permute`
    * [#16703](https://github.com/apache/tvm/pull/16703) - [Fix]Fix top-p/top-k 
sampling kernel
    * [#16669](https://github.com/apache/tvm/pull/16669) - [Frontend][Onnx] add 
sum and globalavgpool 1d/3d op
    * [#16691](https://github.com/apache/tvm/pull/16691) - CUDA graph rewrite 
treating StringImm as static
    * [#16685](https://github.com/apache/tvm/pull/16685) - Implement 
StructInfoPattern for dataflow pattern matching
    * [#16681](https://github.com/apache/tvm/pull/16681) - [Frontend][Onnx] 
support MaxPool1/2/3D and AveragePool1/2/3D
    * [#16584](https://github.com/apache/tvm/pull/16584) - [Unity][TIR] Clear 
struct info when specializing PrimFunc
    * [#16676](https://github.com/apache/tvm/pull/16676) - Remove the 
legalization of cumsum/cumprob
    * [#16654](https://github.com/apache/tvm/pull/16654) - [Frontend][NN] Add 
support for Conv3D
    * [#16674](https://github.com/apache/tvm/pull/16674) - Eager free original 
weights in transform_params
    * [#16675](https://github.com/apache/tvm/pull/16675) - add sample_indices 
in sampling
    * [#16648](https://github.com/apache/tvm/pull/16648) - [Runtime] Support 
Unpack API for NDArrayCache
    * [#16591](https://github.com/apache/tvm/pull/16591) - [Unity][Transform] 
Handle dynamic shapes in CombineParallelMatmul
    * [#16594](https://github.com/apache/tvm/pull/16594) - [Transform] Preserve 
param names in LiftTransformParams
    * [#16575](https://github.com/apache/tvm/pull/16575) - [Unity] GPU sampling
    * [#16574](https://github.com/apache/tvm/pull/16574) - Additional unit 
tests for RemoveUnusedParameters
    * [#16585](https://github.com/apache/tvm/pull/16585) - [Unity][Analysis] 
Include impure call in VerifyWellFormed errors
    * [#16421](https://github.com/apache/tvm/pull/16421) - [Unity][Transform] 
Raise error in FuseOpsByPattern for SSA violation
    * [#16629](https://github.com/apache/tvm/pull/16629) - Fix error message in 
BlockBuilder
    * [#16592](https://github.com/apache/tvm/pull/16592) - Handle dynamic 
arguments in legalization of nn.attention
    * [#16590](https://github.com/apache/tvm/pull/16590) - [Unity][Transform] 
Check for permute_dims in ExpandMatmulOfSum
    * [#16604](https://github.com/apache/tvm/pull/16604) - [Frontend][Onnx] fix 
clip unsqueeze opset implement
    * [#16568](https://github.com/apache/tvm/pull/16568) - [Runtime] RNNState 
for Space State Models
    * [#16563](https://github.com/apache/tvm/pull/16563) - Implement operators 
to read runtime DLTensor* information
    * [#16581](https://github.com/apache/tvm/pull/16581) - 
[Unity][MSC][M4.2][Step2] Enable plugin with manager, test plugins in compile 
pipeline
    * [#16600](https://github.com/apache/tvm/pull/16600) - Expose name_hint 
field for BlockBuilder.match_cast
    * [#16601](https://github.com/apache/tvm/pull/16601) - [Transform] 
Canonicalize `let var = R.const` bindings
    * [#16583](https://github.com/apache/tvm/pull/16583) - [Unity][VM] 
Recursively visit match bindings in VMShapeLowerMutator
    * [#16586](https://github.com/apache/tvm/pull/16586) - Ignore non-relax 
functions in relax.transform.RunCodegen
    * [#16573](https://github.com/apache/tvm/pull/16573) - [VM] 
Re-implementation of callback functions
    * [#16561](https://github.com/apache/tvm/pull/16561) - [Bugfix]Remove call 
to tvm.build for empty TIR module
    * [#16564](https://github.com/apache/tvm/pull/16564) - [Unity] Check for 
symbolic vars in PrimValue in when lowering to TIR
    * [#16558](https://github.com/apache/tvm/pull/16558) - Minor updates for NN 
frontend
    * [#16542](https://github.com/apache/tvm/pull/16542) - Support callback as 
argument
    * [#16487](https://github.com/apache/tvm/pull/16487) - [Unity][Transform] 
Handle `call_tir_inplace` in `FuseTIR` and `FuseOps`
    * [#16355](https://github.com/apache/tvm/pull/16355) - [Unity] Infer struct 
info for relax.op.split on dynamic-sized index
    * [#16465](https://github.com/apache/tvm/pull/16465) - [Redo][Unity] Split 
DecomposeOpsForTraining into two steps
    * [#16495](https://github.com/apache/tvm/pull/16495) - 
[Unity][MSC][M4.2][Step1] Enable plugin with manager, test plugins in compile 
pipeline
    * [#16498](https://github.com/apache/tvm/pull/16498) - [Frontent] 
"tensor_ir_inplace" op
    * [#16500](https://github.com/apache/tvm/pull/16500) - [Unity] Support 
storage reuse for dynamic shapes
    * [#16493](https://github.com/apache/tvm/pull/16493) - [Pass] Skip data 
type node for CSE pass
    * [#16467](https://github.com/apache/tvm/pull/16467) - 
[Unity][MSC][Refactor] Reconstruct BYOC and runner
    * [#16422](https://github.com/apache/tvm/pull/16422) - [Unity][CodeGen] 
RunCodegen based on externally-exposed functions
    * [#16483](https://github.com/apache/tvm/pull/16483) - [Unity][Frontend] 
Add Sigmoid and Square Op
    * [#16472](https://github.com/apache/tvm/pull/16472) - [Unity] Improved 
error message in tvm::relax::UpdateStructInfo
    * [#16473](https://github.com/apache/tvm/pull/16473) - [Unity] Improve 
error message in tensor_to_shape struct inference
    * [#16466](https://github.com/apache/tvm/pull/16466) - Memory planning for 
"partially dynamic" shapes
    * [#16464](https://github.com/apache/tvm/pull/16464) - NDArray Cache Update 
with DLTensor Support
    * [#16315](https://github.com/apache/tvm/pull/16315) - [Unity][Transform] 
Implement relax.transform.ReorderTakeAfterMatmul
    * [#16313](https://github.com/apache/tvm/pull/16313) - [Unity][Transform] 
Implement relax.transform.ExpandMatmulOfSum
    * [#16411](https://github.com/apache/tvm/pull/16411) - [Unity][Transform] 
Handle symbolic variables in LambdaLift
    * [#16443](https://github.com/apache/tvm/pull/16443) - [Unity][FIX] fix 
thread dtype mismatch
    * [#16442](https://github.com/apache/tvm/pull/16442) - Revert "[Unity] 
Split DecomposeOpsForTraining into two steps"
    * [#16437](https://github.com/apache/tvm/pull/16437) - [Unity] Improve 
buffer allocation for handling duplicated buffer names.
    * [#16439](https://github.com/apache/tvm/pull/16439) - [Unity]  Support 
cumsum with pure int32
    * [#16432](https://github.com/apache/tvm/pull/16432) - [Unity] downgrade 
cmake version requirement
    * [#16427](https://github.com/apache/tvm/pull/16427) - 
[Unity][Frontend][NN] Better support for dynamic convolutions
    * [#16418](https://github.com/apache/tvm/pull/16418) - [Unity][Fix] Fix 
mismatched intrinsic name
    * [#16129](https://github.com/apache/tvm/pull/16129) - [Unity][Transform] 
Replace eligible operators with in-place versions in dataflow blocks
    * [#16414](https://github.com/apache/tvm/pull/16414) - [Bugfix][Unity] 
Recover MSVC/NVCC/ROCm/Vulkan
    * [#15954](https://github.com/apache/tvm/pull/15954) - [Unity] Split 
DecomposeOpsForTraining into two steps
    * [#16111](https://github.com/apache/tvm/pull/16111) - [Unity][Transform] 
Memory planning for dynamic-shape func return
    * [#16396](https://github.com/apache/tvm/pull/16396) - [Unity] PagedKVCache 
supporting on-the-fly RoPE calculation
    * [#16395](https://github.com/apache/tvm/pull/16395) - [Frontend][ONNX]fix 
onnx frontend parse
    * [#16385](https://github.com/apache/tvm/pull/16385) - [Unity][Op] Add 
Conv3D Operator
    * [#16284](https://github.com/apache/tvm/pull/16284) - [Unity][nnModule] 
Dynamic shape support in nn Module
    * [#16378](https://github.com/apache/tvm/pull/16378) - 
[Unity][BlockBuilder] Restore bb.get()
    * [#16374](https://github.com/apache/tvm/pull/16374) - [Unity] Support TIR 
kernel for PagedKVCache
    * [#16314](https://github.com/apache/tvm/pull/16314) - [Unity][Transform] 
Implement relax.transform.AdjustMatmulOrder
    * [#16349](https://github.com/apache/tvm/pull/16349) - [Unity][MSC] Avoid 
depending on trivial bindings in Relax intermediate
    * [#16376](https://github.com/apache/tvm/pull/16376) - [Unity][Contrib] Fix 
a bug due to typo in vllm `reconstruct_from_cache` kernel and add test
    * [#16388](https://github.com/apache/tvm/pull/16388) - [Unity] Update 
dispatch test cases following the merge from main
    * [#16335](https://github.com/apache/tvm/pull/16335) - [Unity] Set 
CMAKE_CUDA_ARCHITECTURES default to native
    * [#16306](https://github.com/apache/tvm/pull/16306) - [Unity][Transform] 
Update LambdaLift to use name of lifted lambda
    * [#16310](https://github.com/apache/tvm/pull/16310) - [Unity][Analysis] 
Show objects instead of names in WellFormedChecker
    * [#16362](https://github.com/apache/tvm/pull/16362) - [Unity][Fix] Memory 
planning check value type of 'tir_var_upper_bound'
    * [#16367](https://github.com/apache/tvm/pull/16367) - [Unity][Transform] 
Handle replacement at both var binding and usage
    * [#16309](https://github.com/apache/tvm/pull/16309) - [Unity][Transform] 
Use parameter name in BundleModelParams
    * [#16307](https://github.com/apache/tvm/pull/16307) - [Unity] Improved 
error message in ExprMutator::ReEmitBinding
    * [#16308](https://github.com/apache/tvm/pull/16308) - [Unity] Improved 
error message for matmul shape mismatch
    * [#16360](https://github.com/apache/tvm/pull/16360) - [Unity] Enhance 
Torch-consistency in rehsape
    * [#16350](https://github.com/apache/tvm/pull/16350) - [Unity][Contrib] Add 
vLLM paged attention kernel
    * [#16303](https://github.com/apache/tvm/pull/16303) - [Unity][NN] Use 
Linear name for nn.op.permute_dims
    * [#16325](https://github.com/apache/tvm/pull/16325) - 
[Unity][MSC][Legalize] legalize codes and mute logging
    * [#16312](https://github.com/apache/tvm/pull/16312) - [Unity][Analysis] 
Add utility for collecting compile-time bindings
    * [#16330](https://github.com/apache/tvm/pull/16330) - [Unity][WEBGPU] 
Enable wasm exception propagation
    * [#16304](https://github.com/apache/tvm/pull/16304) - [Unity][Analysis] 
Handle PrimStructInfo in EraseToWellDefined
    * [#16305](https://github.com/apache/tvm/pull/16305) - [Unity][Transform] 
Implement UpdateParamStructInfo
    * [#16331](https://github.com/apache/tvm/pull/16331) - [Unity] Alter op 
impl handling empty transform for output
    * [#16254](https://github.com/apache/tvm/pull/16254) - [Unity] Dispatch 
cumsum and sort
    * [#16120](https://github.com/apache/tvm/pull/16120) - [Unity][Transform] 
Extract partial-tuple-usage from FuseTIR
    * [#16311](https://github.com/apache/tvm/pull/16311) - [Unity] Validate 
struct info in relax::Call constructor
    * [#16333](https://github.com/apache/tvm/pull/16333) - [Unity] Fix 
nn.op.tensor_ir_op signature
    * [#16302](https://github.com/apache/tvm/pull/16302) - [Unity] Cutlass 
kernel compatibility with cmake 3.18+
   
   ### Relay
    * [#16622](https://github.com/apache/tvm/pull/16622) - [ONNX] Fix the 
attribute mode parse of operator Upsample
    * [#16626](https://github.com/apache/tvm/pull/16626) - [ONNX] Fix the 
Resize operator in ONNX frontend
    * [#16624](https://github.com/apache/tvm/pull/16624) - [ONNX] fix the wrong 
default value about dtype in Multinomial converter
    * [#16417](https://github.com/apache/tvm/pull/16417) - [Frontend][Torch] 
fix pytorch frontend linspace op
    * [#16400](https://github.com/apache/tvm/pull/16400) - [Frontend][Torch] 
fix pytorch frontend not support logical or
    * [#16390](https://github.com/apache/tvm/pull/16390) - [Frontend][Torch] 
fix a typo mistake in nonzero_numpy
    * [#16324](https://github.com/apache/tvm/pull/16324) - make "ToScalar" 
support directly obtaining "int64_t"
   
   ### Runtime
    * [#16804](https://github.com/apache/tvm/pull/16804) - Introduce MSCCLPP 
with NCCL equivalent interface
    * [#16809](https://github.com/apache/tvm/pull/16809) - Add "TVM_DLL" to 
NVTX header
    * [#16750](https://github.com/apache/tvm/pull/16750) - CUDA IPC Memory 
support and custom allreduce kernels
    * [#16738](https://github.com/apache/tvm/pull/16738) - [Refactor]Always 
specify device in allocator interface
    * [#16716](https://github.com/apache/tvm/pull/16716) - Ensure 
NDArray.CopyTo(Device) always sync
    * [#16705](https://github.com/apache/tvm/pull/16705) - Add TVM_DLL to 
memory manager functions
    * [#16692](https://github.com/apache/tvm/pull/16692) - PagedKVCache execute 
data copy on a separate stream
    * [#16647](https://github.com/apache/tvm/pull/16647) - [RPC] Fix FreeObject 
in minrpc server
    * [#16667](https://github.com/apache/tvm/pull/16667) - [Builtin] Using 
float32 accumulation in attention kernel
    * [#16635](https://github.com/apache/tvm/pull/16635) - [RPC] Enable 
RPCObjectRef over multi-hop RPC
    * [#16630](https://github.com/apache/tvm/pull/16630) - Add TVM_DLL to 
threading backend funcs
    * [#16541](https://github.com/apache/tvm/pull/16541) - Add "TVM_DLL" to 
NDArray cache load func
    * [#16550](https://github.com/apache/tvm/pull/16550) - [ROCM] Properly 
align rocm parameter buffer
    * [#16545](https://github.com/apache/tvm/pull/16545) - Fix dtype conversion 
for bf16 and fp8
    * [#16508](https://github.com/apache/tvm/pull/16508) - ParallelFor skipping 
thread backend for unit extent
    * [#16486](https://github.com/apache/tvm/pull/16486) - KV cache providing 
workspace for attn kernel
    * [#16456](https://github.com/apache/tvm/pull/16456) - [KVCache] 
AttentionWithFusedQKV and RoPE mode
    * [#16415](https://github.com/apache/tvm/pull/16415) - [Memory] Implement 
support for non-zero offset within a storage object in AllocNDArr…
    * [#16387](https://github.com/apache/tvm/pull/16387) - [RPC] Enable 
RPCObjectRef return in RPC
    * [#16377](https://github.com/apache/tvm/pull/16377) - Use 
cudaGetDeviceCount to check if device exists
   
   ### TIR
    * [#16832](https://github.com/apache/tvm/pull/16832) - Use constructor for 
new PrimFunc in TransformLayout
    * [#16543](https://github.com/apache/tvm/pull/16543) - Fix segfaults from 
ordering of Let/Assert in MakePackedAPI
    * [#16795](https://github.com/apache/tvm/pull/16795) - Ramp and Broadcast 
lanes fixed to int32 dtype
    * [#16767](https://github.com/apache/tvm/pull/16767) - [Driver] Use 
`BindTarget` to specify target for FP8 legalization
    * [#16742](https://github.com/apache/tvm/pull/16742) - [Bugfix]Fix 
cache_read update buffer region
    * [#16726](https://github.com/apache/tvm/pull/16726) - [Bugfix]Avoid 
overwrite of unmanaged buffer allocations
    * [#16548](https://github.com/apache/tvm/pull/16548) - [CUDA] Add native 
FP8 support to codegen
    * [#16723](https://github.com/apache/tvm/pull/16723) - Implement 
max/min_value for fp8 data types
    * [#16655](https://github.com/apache/tvm/pull/16655) - Improve well-formed 
check's handling of match buffer
    * [#16673](https://github.com/apache/tvm/pull/16673) - Support Vector 
Reinterpret Calls
    * [#16682](https://github.com/apache/tvm/pull/16682) - [Bugfix]Handle 
AttrStmt of upcoming tir.Var in ConvertSSA
    * [#16560](https://github.com/apache/tvm/pull/16560) - Enhance and fix 
tensorize schedule for some case
    * [#16660](https://github.com/apache/tvm/pull/16660) - [Bugfix]Fix 
duplicate AllocateConst in CacheReadWrite schedule primitive
    * [#16544](https://github.com/apache/tvm/pull/16544) - Expand debug symbol 
output for CodeGenLLVM
    * [#16553](https://github.com/apache/tvm/pull/16553) - Fix 
get_block_access_region for let bindings
    * [#16515](https://github.com/apache/tvm/pull/16515) - Require exactly 
same-dtype matching for Vulkan smem reuse
    * [#16406](https://github.com/apache/tvm/pull/16406) - Fix of inter thread 
reduction with shared memory prefetch
    * [#16293](https://github.com/apache/tvm/pull/16293) - Extend DP4A tensor 
intrin
    * [#16345](https://github.com/apache/tvm/pull/16345) - Allow sync threads 
inside condition
    * [#16250](https://github.com/apache/tvm/pull/16250) - In SplitHostDevice, 
check for variables in thread extents
    * [#16184](https://github.com/apache/tvm/pull/16184) - [Transform] 
Implement InlinePrivateFunctions
   
   ### TOPI
    * [#16652](https://github.com/apache/tvm/pull/16652) - improve 
inclusive_scan for thrust
    * [#16383](https://github.com/apache/tvm/pull/16383) - [Target] Add fp16 
SIMD support for conv2d on `arm_cpu` targets
   
   ### TVMC
    * [#16261](https://github.com/apache/tvm/pull/16261) - Add tvmc flag to 
print ir before and print ir after named pass
   
   ### TVMScript
    * [#16864](https://github.com/apache/tvm/pull/16864) - Add parser and 
printer support for e4m3/e5m2 fp8
    * [#16844](https://github.com/apache/tvm/pull/16844) - Produce empty 
DictAttrs when R.func_attrs is absent
    * [#16811](https://github.com/apache/tvm/pull/16811) - Do not throw error 
for duplicate definitions
    * [#16641](https://github.com/apache/tvm/pull/16641) - Allow use of 
relax.Expr with void type as a statement
    * [#16663](https://github.com/apache/tvm/pull/16663) - Infer T.reads() for 
DeclBuffer nodes
    * [#16640](https://github.com/apache/tvm/pull/16640) - Represent 
tir::builtin::ret() using python "return"
    * [#16562](https://github.com/apache/tvm/pull/16562) - [Bugfix]Handle 
R.match_cast as last binding in if/else
    * [#16593](https://github.com/apache/tvm/pull/16593) - [Unity]Parse 
R.Object return type from call_pure_packed
    * [#16356](https://github.com/apache/tvm/pull/16356) - [Unity]Optionally 
hide StructInfo that can be inferred
    * [#16379](https://github.com/apache/tvm/pull/16379) - [Unity]Update 
`call_packed` semantics to support empty sinfo_args
   
   ### Vulkan
    * [#16858](https://github.com/apache/tvm/pull/16858) - Fix CLZ support for 
Vulkan
   
   ### cuda & cutlass & tensorrt
    * [#16865](https://github.com/apache/tvm/pull/16865) - [Codegen, CUDA] Add 
handling of fp8 broadcast / const
    * [#16818](https://github.com/apache/tvm/pull/16818) - [Cutlass] Fix usage 
of cuda stream for group gemm
    * [#16788](https://github.com/apache/tvm/pull/16788) - [Cutlass] Add check 
for group gemm param shapes
    * [#16789](https://github.com/apache/tvm/pull/16789) - [Bugfix][Cutlass] 
Remove a typo in cutlass build
    * [#16787](https://github.com/apache/tvm/pull/16787) - [Codegen, Cuda] Add 
overload for fp8x4 e5m2 <-> half4 conversion
    * [#16751](https://github.com/apache/tvm/pull/16751) - [Cutlass] Add group 
gemm kernels
    * [#16736](https://github.com/apache/tvm/pull/16736) - [Target][CUDA] Allow 
non-numeric arch as needed for latest gpu
    * [#16619](https://github.com/apache/tvm/pull/16619) - [Bugfix][Cutlass] 
Check if function attributes is None
    * [#16342](https://github.com/apache/tvm/pull/16342) - [CUDA] Simple extend 
to optimize reuse for static shared memory.
    * [#16342](https://github.com/apache/tvm/pull/16342) - [CUDA] Simple extend 
to optimize reuse for static shared memory.
    * [#16342](https://github.com/apache/tvm/pull/16342) - [CUDA] Simple extend 
to optimize reuse for static shared memory.
    * [#16342](https://github.com/apache/tvm/pull/16342) - [CUDA] Simple extend 
to optimize reuse for static shared memory.
    * [#16342](https://github.com/apache/tvm/pull/16342) - [CUDA] Simple extend 
to optimize reuse for static shared memory.
   
   ### micoNPU
    * [#16266](https://github.com/apache/tvm/pull/16266) - [microNPU][ETHOSU] 
Add fixed point for tanh
    * [#16680](https://github.com/apache/tvm/pull/16680) - [microNPU][ETHOSU] 
Fix LUT size for int16 activations
    * [#16401](https://github.com/apache/tvm/pull/16401) - [microNPU][ETHOSU] 
Add fixed point for matmul
   
   ### web
    * [#16733](https://github.com/apache/tvm/pull/16733) - Support web indexDB 
cache for larger model storage
    * [#16810](https://github.com/apache/tvm/pull/16810) - Support building 
tvm/web on Windows
    * [#16825](https://github.com/apache/tvm/pull/16825) - Allow custom bc 
files in emcc making
    * [#16791](https://github.com/apache/tvm/pull/16791) - Add `kv_state` and 
`rnn_state` to wasm_runtime
    * [#16722](https://github.com/apache/tvm/pull/16722) - Implement linear 
congruential generator, make runtime seedable
    * [#16650](https://github.com/apache/tvm/pull/16650) - Seperate parallel 
shard download and iterative shard loading
    * [#16694](https://github.com/apache/tvm/pull/16694) - Initial support for 
asyncify
    * [#16631](https://github.com/apache/tvm/pull/16631) - Fix NDArrayCache 
loading report callback
    * [#16525](https://github.com/apache/tvm/pull/16525) - Move ArtifactCache 
to Interface, Support Cache delete and Batch Delete, Remove typo
    * [#16554](https://github.com/apache/tvm/pull/16554) - Compatibility with 
PagedKVCache in WebGPU
    * [#16527](https://github.com/apache/tvm/pull/16527) - Revert "[Unity]Temp 
disable wasm exception (#16444)"
    * [#16504](https://github.com/apache/tvm/pull/16504) - [Relax]Add 
ApplyPresenceAndRequencyPenalty
    * [#16485](https://github.com/apache/tvm/pull/16485) - [wasm] Enlarge 
initial memory for emcc
    * [#16444](https://github.com/apache/tvm/pull/16444) - [Unity]Temp disable 
wasm exception
   
   ### Misc
    * [#16873](https://github.com/apache/tvm/pull/16873) - [Thrust] Fix thrust 
workspace allocation
    * [#16868](https://github.com/apache/tvm/pull/16868) - [3rdparty] Bump 
flashinfer
    * [#16871](https://github.com/apache/tvm/pull/16871) - [PageKV] allow PopN 
to pop all the tokens in last block
    * [#16866](https://github.com/apache/tvm/pull/16866) - [3rdparty] Bump 
FlashInfer
    * [#16863](https://github.com/apache/tvm/pull/16863) - [Picojson] Let the 
key of objects in json be ordered by default
    * [#16856](https://github.com/apache/tvm/pull/16856) - [Thrust] Use pointer 
to tls pool to prevent creating new pool
    * [#16850](https://github.com/apache/tvm/pull/16850) - Fixing probability 
comment
    * [#16849](https://github.com/apache/tvm/pull/16849) - [KVCache] Initialize 
one extra page than specified
    * [#16843](https://github.com/apache/tvm/pull/16843) - [IR] Provide 
well-formed intermediate in ApplyPassToFunction
    * [#16772](https://github.com/apache/tvm/pull/16772) - [MSC][M5.3] Support 
torch.dynamo for dynamic models
    * [#16839](https://github.com/apache/tvm/pull/16839) - Bump pillow from 
10.2.0 to 10.3.0 in /apps/microtvm/cmsisnn
    * [#16838](https://github.com/apache/tvm/pull/16838) - Bump pillow from 
10.2.0 to 10.3.0 in /apps/microtvm/ethosu
    * [#16831](https://github.com/apache/tvm/pull/16831) - [KVCache] Reducing 
CacheAuxDataManager copy size
    * [#16794](https://github.com/apache/tvm/pull/16794) - [SME] Target parser 
support for SME
    * [#16824](https://github.com/apache/tvm/pull/16824) - [KVCache] 
Introducing auxiliary data manager
    * [#16800](https://github.com/apache/tvm/pull/16800) - [BugTIR]fix error 
merging shared memory for ptx_cp_async
    * [#16822](https://github.com/apache/tvm/pull/16822) - [VM] Recycle VMFrame
    * [#16813](https://github.com/apache/tvm/pull/16813) - [KVCache] Support 
forking sequence at specific posotion
    * [#16786](https://github.com/apache/tvm/pull/16786) - [Codegen] Add check 
to disable invalid reinterpret
    * [#16816](https://github.com/apache/tvm/pull/16816) - [Cmake] Allow using 
custom CCCL path for thrust
    * [#16784](https://github.com/apache/tvm/pull/16784) - [SLM] Add unit tests 
for SLM to Relax exporter
    * [#16814](https://github.com/apache/tvm/pull/16814) - Fix includes of 
custom allreduce kernel
    * [#16806](https://github.com/apache/tvm/pull/16806) - [Debug] Improve 
error message in VMShapeLower
    * [#16802](https://github.com/apache/tvm/pull/16802) - [Debug] Improve 
error messages in LiftTransformParams
    * [#16425](https://github.com/apache/tvm/pull/16425) - [Target] Use LLVM 
target parser for determining Arm(R) A-Profile Architecture features
    * [#16797](https://github.com/apache/tvm/pull/16797) - [3rdparty] AUTO mode 
for custom all-reduce strategy
    * [#16761](https://github.com/apache/tvm/pull/16761) - [SME] Add support 
for inserting processor state annotations
    * [#16778](https://github.com/apache/tvm/pull/16778) - [Analysis] Allow 
calls to GlobalVar in @R.function
    * [#16745](https://github.com/apache/tvm/pull/16745) - [IR] Default to 
empty attributes, instead of NULL
    * [#16777](https://github.com/apache/tvm/pull/16777) - Revert "[SLM] Allow 
modules to define pre-processing of weights"
    * [#16776](https://github.com/apache/tvm/pull/16776) - [Contrib] Remove 
thrust "built but not used" warning
    * [#16757](https://github.com/apache/tvm/pull/16757) - [SLM] Allow modules 
to define pre-processing of weights
    * [#16763](https://github.com/apache/tvm/pull/16763) - [CONTRIB] Add nm 
symbol dump
    * [#16717](https://github.com/apache/tvm/pull/16717) - Enable Shared 
Function in LiftTransformParam Pass
    * [#16729](https://github.com/apache/tvm/pull/16729) - [Builtin] Sliding 
window and sink support for PagedKVCache
    * [#16724](https://github.com/apache/tvm/pull/16724) - Fix cpp_rtvm cmake 
build on Windows
    * [#16513](https://github.com/apache/tvm/pull/16513) - [Target] 
Automatically detect system triple when not specified by the user
    * [#16710](https://github.com/apache/tvm/pull/16710) - [CMake] Add 
"USE_FLASHINFER" to libinfo
    * [#16702](https://github.com/apache/tvm/pull/16702) - [MSC][M5.2] Enable 
quantize && prune with gym by wrapper
    * [#16699](https://github.com/apache/tvm/pull/16699) - [Transform] Remove 
R.Object parameters after LazyTransformParams
    * [#16668](https://github.com/apache/tvm/pull/16668) - [MSC][M5.1] Build 
wrapper to support compression
    * [#16693](https://github.com/apache/tvm/pull/16693) - [Contrib] Support 
NDArray cache taking generator
    * [#16412](https://github.com/apache/tvm/pull/16412) - [Lint] Add check to 
prevent usage of #include <regex>
    * [#16689](https://github.com/apache/tvm/pull/16689) - [DeviceAPI] Support 
"GetCurrentStream"
    * [#16690](https://github.com/apache/tvm/pull/16690) - Use target name 
instead of node name as function name
    * [#16683](https://github.com/apache/tvm/pull/16683) - [skip ci] Fix wasm 
exception flag
    * [#16609](https://github.com/apache/tvm/pull/16609) - Minor update docs 
instructions
    * [#16656](https://github.com/apache/tvm/pull/16656) - Simplify Windows 
CMake Command
    * [#16666](https://github.com/apache/tvm/pull/16666) - [KVCache] Fix the 
reference counter in sequence fork
    * [#16662](https://github.com/apache/tvm/pull/16662) - Fixing workload 
comment
    * [#16595](https://github.com/apache/tvm/pull/16595) - [Transform] Check 
for zero-param operators in LiftTransformParams
    * [#16599](https://github.com/apache/tvm/pull/16599) - [Transform] 
De-duplicate MatchCast nodes in EliminateCommonSubexpr
    * [#16596](https://github.com/apache/tvm/pull/16596) - [Transform] 
Implement relax.transform.ReorderPermuteDimsAfterConcat
    * [#16597](https://github.com/apache/tvm/pull/16597) - [Transform] Allow 
explicit name of bundled model parameters
    * [#16602](https://github.com/apache/tvm/pull/16602) - [Transform] 
Improvements to LazyTransformParams
    * [#16606](https://github.com/apache/tvm/pull/16606) - [KVCache] Support 
passing in attn_score_scaling_factor into KV cache
    * [#16608](https://github.com/apache/tvm/pull/16608) - Extend gpu memory 
bandwidth test to work through RPC
    * [#16587](https://github.com/apache/tvm/pull/16587) - [Debug] Improve 
error message for codegen pattern mismatches
    * [#16570](https://github.com/apache/tvm/pull/16570) - [Marvell BYOC]: 
Marvell AI Accelerator Integration - Phase 1
    * [#16576](https://github.com/apache/tvm/pull/16576) - Update the 
3rdparty/libflash_attn submodule
    * [#16580](https://github.com/apache/tvm/pull/16580) - [KVCache] Support 
mode "None" for Rotary Embebdding
    * [#16578](https://github.com/apache/tvm/pull/16578) - [KVCache] Support 
returning query positions
    * [#16571](https://github.com/apache/tvm/pull/16571) - Fix compile warnings
    * [#16540](https://github.com/apache/tvm/pull/16540) - [Upd] Enable lld 
search to include /opt/rocm/llvm/bin for rocm
    * [#16539](https://github.com/apache/tvm/pull/16539) - Improve error 
message in NDArray::CopyFromTo
    * [#16524](https://github.com/apache/tvm/pull/16524) - [Build] Improving 
debug and build-dir options
    * [#16551](https://github.com/apache/tvm/pull/16551) - [KVCache] Fix 
attention kernel for ROCm
    * [#16512](https://github.com/apache/tvm/pull/16512) - Cut 
pytest-lazy-fixture
    * [#16506](https://github.com/apache/tvm/pull/16506) - Bump 
3rdparty/cutlass_fpA_intB_gemm version
    * [#16511](https://github.com/apache/tvm/pull/16511) - [Minor] Fix Clang 
compilation warning in fuse_tir.cc and codegen_c_host.cc
    * [#16516](https://github.com/apache/tvm/pull/16516) - Add Relax, Unity 
Tags in make_notes.py
    * [#16497](https://github.com/apache/tvm/pull/16497) - [Instrument] Add 
default instrument to print all passes
    * [#16494](https://github.com/apache/tvm/pull/16494) - [DPL] Support 
tir_vars field in is_call_tir pattern
    * [#16453](https://github.com/apache/tvm/pull/16453) - Bump pillow from 
10.0.1 to 10.2.0 in /apps/microtvm
    * [#16454](https://github.com/apache/tvm/pull/16454) - [BugTIR] fix 
thread_sync occurs in letstmt
    * [#16468](https://github.com/apache/tvm/pull/16468) - [LINT] Fix pylint 
issues in test_dma_builtin.py
    * [#16413](https://github.com/apache/tvm/pull/16413) - [Contrib] Workspace 
for cuBLAS backend
    * [#16460](https://github.com/apache/tvm/pull/16460) - 
[Cherry-pick][MSC][M4.1] Add plugin && plugin_builder, enable build and test in 
different frameworks (#16397)
    * [#16461](https://github.com/apache/tvm/pull/16461) - [Minor] Fix 
Docstring for sphinx-build
    * [#16431](https://github.com/apache/tvm/pull/16431) - [Schedule] 
Loop-Partition Scheduling Primitive
    * [#16451](https://github.com/apache/tvm/pull/16451) - Bump pillow from 
10.0.1 to 10.2.0 in /apps/microtvm/ethosu
    * [#16452](https://github.com/apache/tvm/pull/16452) - Bump pillow from 
10.0.1 to 10.2.0 in /apps/microtvm/cmsisnn
    * [#16445](https://github.com/apache/tvm/pull/16445) - [skip ci] update 
branch rule to prepare for unity transition
    * [#16426](https://github.com/apache/tvm/pull/16426) - [CMake] Enable cuda 
lang if USE_CUDA is on
    * [#16407](https://github.com/apache/tvm/pull/16407) - Add NVIDIA Hopper 
H100 target tag
    * [#16398](https://github.com/apache/tvm/pull/16398) - [DeviceAPI] Support 
querying total global memory
    * [#16357](https://github.com/apache/tvm/pull/16357) - [RPC] Fix tuning on 
macOS and Windows (#15771)
    * [#16386](https://github.com/apache/tvm/pull/16386) - [Thrust] Use no sync 
exec policy and caching allocator
    * [#16343](https://github.com/apache/tvm/pull/16343) - [CMake][MSVC] 
Disable permissive mode for MSVC builds
    * [#16242](https://github.com/apache/tvm/pull/16242) - [Codegen] Fix 
if_then_else codegen
    * [#16341](https://github.com/apache/tvm/pull/16341) - [CMake] Use ccache 
as CMAKE_CUDA_COMPILER_LAUNCHER
    * [#16332](https://github.com/apache/tvm/pull/16332) - Change metal dtype 
of ceil_log2 to fp32


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Release] v0.16.0 Release Candidate Notes [tvm]

Reply via email to