ysh329 opened a new issue, #17860:
URL: https://github.com/apache/tvm/issues/17860

   # Introduction
   
   The TVM community has worked since the last release to deliver the following 
new exciting improvements!
   
   The main tags are below (**bold text is with lots of progress**): Relax 
(especial PyTorch frontend), CUDA etc.
   
   Please visit the full listing of commits for a complete view: 
[v0.20.dev0...v0.20.0.rc0](https://github.com/apache/tvm/compare/v0.20.dev0...v0.20.0.rc0).
   
   ### Community
   
   None.
   
   ### RFCs
   
   None.
   
   ### Adreno
    * [#17608](https://github.com/apache/tvm/pull/17608) - [WINDOWS] Windows 
build dependencies for Adreno target
   
   ### BugFix
    * [#17761](https://github.com/apache/tvm/pull/17761) - [FIX][RELAX] fix 
fusion of transpose + matmul when constant  weight 
    * [#17762](https://github.com/apache/tvm/pull/17762) - [Fix] Fix OpenCL 
header in attention utils
    * [#17711](https://github.com/apache/tvm/pull/17711) - [Fix][dlight] add an 
explicit reduction loop check in Reduce
    * [#17697](https://github.com/apache/tvm/pull/17697) - [Fix] Include 
`<chrono>` for `std::chrono`
    * [#17677](https://github.com/apache/tvm/pull/17677) - Declare build 
backend for python package
    * [#17598](https://github.com/apache/tvm/pull/17598) - [TIR][FIX] update 
FlopEstimator to include missing nodes
    * [#17601](https://github.com/apache/tvm/pull/17601) - [Flashinfer][Fix] 
fix missing args in flashinfer test
    * [#17607](https://github.com/apache/tvm/pull/17607) - [FIX][TVMC] Fix the 
mixed precision conversion pipeline
   
   ### CI
    * [#17687](https://github.com/apache/tvm/pull/17687) - Update images to 
20250226-223225-63bc315f
    * [#17680](https://github.com/apache/tvm/pull/17680) - update images to 
20250225-035137-aeadc31c
    * [#17675](https://github.com/apache/tvm/pull/17675) - [skip ci]Update 
github tvmbot
    * [#17635](https://github.com/apache/tvm/pull/17635) - Cleanup legacy files
    * [#17634](https://github.com/apache/tvm/pull/17634) - [skip ci]Improve 
build time
    * [#17629](https://github.com/apache/tvm/pull/17629) - [skip ci]Robustify 
CI for SPOT failure
    * [#17620](https://github.com/apache/tvm/pull/17620) - Unpin 
pytest-profiling
    * [#17621](https://github.com/apache/tvm/pull/17621) - [skip ci] Remove 
legacy CI runners protection
    * [#17619](https://github.com/apache/tvm/pull/17619) - [Refactor]Remove 
legacy frontend tests
   
   ### Dlight
    * [#17754](https://github.com/apache/tvm/pull/17754) - Fix general 
reduction rule to support non-last reduction axis
    * [#17663](https://github.com/apache/tvm/pull/17663) - [CPU] Add CPU 
Backend Support for GEMV Optimization
   
   ### Docker
    * [#17691](https://github.com/apache/tvm/pull/17691) - Fix ml_dtypes 
downgrade issue introduced by TensorFlow
    * [#17686](https://github.com/apache/tvm/pull/17686) - Update ml_dtypes to 
0.5.1+
    * [#17676](https://github.com/apache/tvm/pull/17676) - Use Torch GPU on gpu 
device
    * [#17648](https://github.com/apache/tvm/pull/17648) - Tensorflow (aka 
TFLite) upgrade to 2.18.0
    * [#17643](https://github.com/apache/tvm/pull/17643) - Update ml_dtypes 
version
    * [#17638](https://github.com/apache/tvm/pull/17638) - [skip ci]Update 
ml_dtypes version
    * [#17638](https://github.com/apache/tvm/pull/17638) - [skip ci]Update 
ml_dtypes version
    * [#17617](https://github.com/apache/tvm/pull/17617) - Tensorflow upgrade 
to 2.18.0
   
   ### Docs
    * [#17650](https://github.com/apache/tvm/pull/17650) - Update README
    * [#17611](https://github.com/apache/tvm/pull/17611) - Download 3rd party 
embeds to local files
    * [#17604](https://github.com/apache/tvm/pull/17604) - Update README
   
   ### MetaSchedule
    * [#17104](https://github.com/apache/tvm/pull/17104) - Adding post 
optimization in MetaSchedule to Improve Scheduling
   
   ### OpenCL & CLML
    * [#17571](https://github.com/apache/tvm/pull/17571) - [OPENCL][TEXTURE] 
Improved texture memory planning
   
   ### Relax
    * [#17814](https://github.com/apache/tvm/pull/17814) - [PyTorch] Add 
stack.default and sum.default to exported programs translator
    * [#17820](https://github.com/apache/tvm/pull/17820) - [PyTorch] Add 
support for broadcast_to, narrow ops
    * [#17822](https://github.com/apache/tvm/pull/17822) - [PyTorch] Cleanup 
tests for ExportedProgram frontend
    * [#17806](https://github.com/apache/tvm/pull/17806) - [PyTorch] Add 
Softplus Op Support for Exported Program and FX graph
    * [#17817](https://github.com/apache/tvm/pull/17817) - [PyTorch] Support 
dynamic shapes in ExportedProgram frontend
    * [#17813](https://github.com/apache/tvm/pull/17813) - [PyTorch] Improve 
ExportedProgram frontend by supporting `unflatten.int`, `hardtanh_.default`, 
`dropout_.default`, `silu_.default`, `add_.Tensor` and `relu_.default`
    * [#17812](https://github.com/apache/tvm/pull/17812) - [PyTorch] Support 
argsort, topk ops for ExportedProgram importer
    * [#17810](https://github.com/apache/tvm/pull/17810) - [PyTorch] Add 
support for argsort, sort, topk ops
    * [#17809](https://github.com/apache/tvm/pull/17809) - [PyTorch] Delete 
duplicate converter function `_to`
    * [#17807](https://github.com/apache/tvm/pull/17807) - [PyTorch] Fix torch 
2.6 compatibility issues
    * [#17797](https://github.com/apache/tvm/pull/17797) - [Pytorch] Update 
SELU Implementation Using Decomposed Core-Level Ops
    * [#17802](https://github.com/apache/tvm/pull/17802) - [Pytorch] support 
for arange in exported programs translator
    * [#17801](https://github.com/apache/tvm/pull/17801) - [PyTorch] Support 
where, cumprod and reciprocal ops for ExportedProgram importer
    * [#17790](https://github.com/apache/tvm/pull/17790) - [PyTorch] Add 
support for index_select
    * [#17786](https://github.com/apache/tvm/pull/17786) - [PyTorch] Support 
softshrink op for ExportedProgram
    * [#17788](https://github.com/apache/tvm/pull/17788) - [PyTorch] Add 
support for where, cumprod and reciprocal ops
    * [#17785](https://github.com/apache/tvm/pull/17785) - [PyTorch] Support 
prod, std and var ops for ExportedProgram importer
    * [#17778](https://github.com/apache/tvm/pull/17778) - [PyTorch] Support 
log2, log10 and log1p ops for ExportedProgram importer
    * [#17772](https://github.com/apache/tvm/pull/17772) - [PyTorch] Add 
support for prod, std and var ops
    * [#17766](https://github.com/apache/tvm/pull/17766) - [PyTorch] Add 
support for log2, log10 and log1p ops
    * [#17760](https://github.com/apache/tvm/pull/17760) - [PyTorch] Add 
support for lerp, select and clone ops
    * [#17751](https://github.com/apache/tvm/pull/17751) - [PyTorch] Support 
one_hot, empty_like ops for ExportedProgram importer
    * [#17747](https://github.com/apache/tvm/pull/17747) - [PyTorch] Support 
flip, gather, take ops for ExportedProgram importer
    * [#17738](https://github.com/apache/tvm/pull/17738) - [PyTorch] Support 
elu, celu, selu ops for ExportedProgram importer
    * [#17726](https://github.com/apache/tvm/pull/17726) - [PyTorch] Add 
support for numel, empty_like and one_hot ops
    * [#17707](https://github.com/apache/tvm/pull/17707) - [PyTorch] Add 
support for gather, flip and take ops
    * [#17702](https://github.com/apache/tvm/pull/17702) - [PyTorch] Add 
support for celu, selu, is_floating_point ops
    * [#17694](https://github.com/apache/tvm/pull/17694) - [PyTorch] Add 
support for elu, hardtanh ops
    * [#17689](https://github.com/apache/tvm/pull/17689) - [PyTorch] Support 
several binary ops for ExportedProgram importer
    * [#17672](https://github.com/apache/tvm/pull/17672) - [PyTorch] Refactor 
binary ops tests
    * [#17679](https://github.com/apache/tvm/pull/17679) - [PyTorch] Support 
several unary ops for ExportedProgram importer
    * [#17668](https://github.com/apache/tvm/pull/17668) - [PyTorch] Add 
support for and_, lshift, min, or_, rshift, xor ops
    * [#17664](https://github.com/apache/tvm/pull/17664) - [PyTorch] Add 
support for ge, gt, le, mod, ne ops
    * [#17659](https://github.com/apache/tvm/pull/17659) - [PyTorch] Add 
support for bitwise_not, isfinite, isinf, isnan, logical_not, sign and square 
ops
    * [#17622](https://github.com/apache/tvm/pull/17622) - [PyTorch] Add 
support for abs, ceil, erf, floor, log ops and refactor unary tests
    * [#17566](https://github.com/apache/tvm/pull/17566) - [ONNX] Add prim 
experssion support to Neg converter and update Arange converter to use 
relax.op.arange
    * [#17642](https://github.com/apache/tvm/pull/17642) - [ONNX]replace 
topi.split with relax.op.split in the onnx frontend
    * [#17674](https://github.com/apache/tvm/pull/17674) - [KVCache] 
PagedKVCache refactor, FlashInfer JIT and MLA integration
    * [#17618](https://github.com/apache/tvm/pull/17618) - [KVCache] TIR 
attention kernel support for MLA
    * [#17615](https://github.com/apache/tvm/pull/17615) - [KVCache] Add KV 
Cache for CPU Runtime
    * [#17616](https://github.com/apache/tvm/pull/17616) - [Runtime][KVCache] 
Initial interface setup for MLA
    * [#17782](https://github.com/apache/tvm/pull/17782) - [Frontend] Support 
max/min in frontend op interface
    * [#17758](https://github.com/apache/tvm/pull/17758) - Allow ingesting 
tensor.chunk() from exported torch program
    * [#17781](https://github.com/apache/tvm/pull/17781) - Enable bfloat16 for 
softmax struct-info inference
    * [#17752](https://github.com/apache/tvm/pull/17752) - Batch norm 
correctness on eval mode
    * [#17774](https://github.com/apache/tvm/pull/17774) - check for 
tensor_meta in exported_program_translator
    * [#17757](https://github.com/apache/tvm/pull/17757) - Tensor.split with 
uneven tensors
    * [#17749](https://github.com/apache/tvm/pull/17749) - Move TIR backend to 
gpu_generic
    * [#17725](https://github.com/apache/tvm/pull/17725) - Ingest Tensor.clamp 
from torch export
    * [#17724](https://github.com/apache/tvm/pull/17724) - Add support to 
ingest Tensor.expand_as()
    * [#17723](https://github.com/apache/tvm/pull/17723) - Add torch exported 
program ingestion capability for Tensor.detach(), Tensor.copy_, and 
aten.lift_fresh_copy
    * [#17721](https://github.com/apache/tvm/pull/17721) - Allow ingesting 
Upsample module from torch.export either using Size or Scale Factor argument
    * [#17722](https://github.com/apache/tvm/pull/17722) - Allow ingesting 
vector_norm from torch.export
    * [#17728](https://github.com/apache/tvm/pull/17728) - ingest 
Tensor.contiguous from torch export
    * [#17700](https://github.com/apache/tvm/pull/17700) - Fix tree attention 
for Qwen2-1.5 models
    * [#17682](https://github.com/apache/tvm/pull/17682) - Add support for func 
attr inheritance in SplitLayoutRewritePreproc
    * [#17654](https://github.com/apache/tvm/pull/17654) - [BYOC] OpenCLML 
offload support for Relax
    * [#17633](https://github.com/apache/tvm/pull/17633) - Pipeline file 
reorganization
    * [#17626](https://github.com/apache/tvm/pull/17626) - Initial setup of 
relax backend pipeline
    * [#17568](https://github.com/apache/tvm/pull/17568) - [PASS] Convert 
layout pass and ops enhanced to support sub indexing
   
   ### Runtime
    * [#17614](https://github.com/apache/tvm/pull/17614) - [CLML] Profiling 
options enabled for CLML
    * [#17614](https://github.com/apache/tvm/pull/17614) - [CLML] Profiling 
options enabled for CLML
    * [#17570](https://github.com/apache/tvm/pull/17570) - [OPENCL] Bugfix
   
   ### TIR
    * [#17799](https://github.com/apache/tvm/pull/17799) - Fix reduce buffer 
allocation position
    * [#17783](https://github.com/apache/tvm/pull/17783) - [REFACTOR]remove 
legacy tir::any
    * [#17706](https://github.com/apache/tvm/pull/17706) - Minor fix for 
default GPU schedule
    * [#17579](https://github.com/apache/tvm/pull/17579) - [SoftwarePipeline] 
Ensure pipeline epilogue and prologue do not overlap
    * [#17584](https://github.com/apache/tvm/pull/17584) - [LoopPartition] 
enforcement on loop partition control
   
   ### TVMC
    * [#17606](https://github.com/apache/tvm/pull/17606) - Bug fix
   
   ### cuda & cutlass & tensorrt
    * [#17789](https://github.com/apache/tvm/pull/17789) - [CUTLASS] Add 
blockwise scale gemm/bmm kernels
    * [#17741](https://github.com/apache/tvm/pull/17741) - [Codegen][CUDA] Fix 
codegen of cast among vector bfloat16, fp8 and fp4
    * [#17708](https://github.com/apache/tvm/pull/17708) - [CUDA] FP4 cast and 
reinterpret support
    * [#17639](https://github.com/apache/tvm/pull/17639) - [CUDA] Remove htanh 
from unsupported math ops for CUDA 12.8
    * [#16950](https://github.com/apache/tvm/pull/16950) - [Codegen, CUDA] Add 
FP8 Tensor Core Codegen
   
   ### web
    * [#17695](https://github.com/apache/tvm/pull/17695) - [WASM] Update wasm 
include in accordance to kv cache revamp
   
   ### Misc
    * [#17796](https://github.com/apache/tvm/pull/17796) - [Cublas] Added 
support for bfloat16 while dispatching to cublas kernels
    * [#17763](https://github.com/apache/tvm/pull/17763) - [Flashinfer] Added 
jit flow for sampling kernel
    * [#17811](https://github.com/apache/tvm/pull/17811) - [NFC] Fix `explict` 
typo
    * [#17780](https://github.com/apache/tvm/pull/17780) - [3rdparty] Enable 
bfloat16 for custom allreduce kernel
    * [#17784](https://github.com/apache/tvm/pull/17784) - [REFACTOR] Phase out 
StackVM
    * [#17750](https://github.com/apache/tvm/pull/17750) - BugFix: Relax comment
    * [#17748](https://github.com/apache/tvm/pull/17748) - [Codegen] Support 
codegen for vectorized tir.ShuffleNode
    * [#17743](https://github.com/apache/tvm/pull/17743) - Fix: Change variable 
i to x in split operation in cross_compilation_and_rpc.py
    * [#17730](https://github.com/apache/tvm/pull/17730) - [Attention] Added 
caching for flashinfer binaries during JIT
    * [#17733](https://github.com/apache/tvm/pull/17733) - [Refactor] Clean up 
Relay references in the codebase
    * [#17739](https://github.com/apache/tvm/pull/17739) - [BF16] Support 
ndarray.asnumpy() to bfloat16 tensor natively using ml_dtypes
    * [#17734](https://github.com/apache/tvm/pull/17734) - Remove Google 
Analytics
    * [#17731](https://github.com/apache/tvm/pull/17731) - [IR] Compact Functor 
vtable
    * [#17736](https://github.com/apache/tvm/pull/17736) - Fix typos in 
comments and strings
    * [#17670](https://github.com/apache/tvm/pull/17670) - [DataType] BF16 
Support
    * [#17727](https://github.com/apache/tvm/pull/17727) - [FFI] Fix dynamic 
FFI index to ensure compatibility
    * [#17718](https://github.com/apache/tvm/pull/17718) - [Refactor] Migrate 
build API to `tvm.compile`
    * [#17714](https://github.com/apache/tvm/pull/17714) - [FFI] Phase out 
ctypes fallback in favor of cython
    * [#17716](https://github.com/apache/tvm/pull/17716) - Fix the 
get_target_compute_version for sm >= 100
    * [#17710](https://github.com/apache/tvm/pull/17710) - [Refactor] Introduce 
base Executable class and `tvm.compile` interface
    * [#17713](https://github.com/apache/tvm/pull/17713) - [REFACTOR] Cleanup 
legacy relay runtime data structures
    * [#17712](https://github.com/apache/tvm/pull/17712) - [DataType] Rename 
FP8 dtypes to standard names
    * [#17703](https://github.com/apache/tvm/pull/17703) - Fix typos in 
multiple files
    * [#17693](https://github.com/apache/tvm/pull/17693) - updated the assert 
in BindParams to allow tvm.relax.Constant
    * [#17701](https://github.com/apache/tvm/pull/17701) - [Refactor] Remove 
legacy TE schedule tag
    * [#17683](https://github.com/apache/tvm/pull/17683) - [MSC] Remove relay
    * [#17688](https://github.com/apache/tvm/pull/17688) - Fix 
relax.ccl.scatter_from_worker0 assert
    * [#17630](https://github.com/apache/tvm/pull/17630) - [Codegen] FP4 support
    * [#17685](https://github.com/apache/tvm/pull/17685) - [REFACTOR] Cleanup 
legacy TE-based passes
    * [#17681](https://github.com/apache/tvm/pull/17681) - [REFACTOR] Followup 
cleanup of relay phase out
    * [#17678](https://github.com/apache/tvm/pull/17678) - Bump 
3rdparty/cutlass_fpA_intB_gemm
    * [#17669](https://github.com/apache/tvm/pull/17669) - [REFACTOR] Allow 
target dependent default tir pipeline dispatch in tir.build()
    * [#17665](https://github.com/apache/tvm/pull/17665) - [REFACTOR] move 
build flow from C++ to Python
    * [#17624](https://github.com/apache/tvm/pull/17624) - Added support for 
normal MLA kernel
    * [#17641](https://github.com/apache/tvm/pull/17641) - Pick up vector 
length from 'zvlXXXb' (RVV) mattr for riscv
    * [#17666](https://github.com/apache/tvm/pull/17666) - [Refactor] Improve 
TargetHasSVE function with optional target handling
    * [#17661](https://github.com/apache/tvm/pull/17661) - [Refactor] Phrase 
out python dependency `decorator`
    * [#17662](https://github.com/apache/tvm/pull/17662) - [REFACTOR] Phase out 
te.Schedule c++ components
    * [#17660](https://github.com/apache/tvm/pull/17660) - [REFACTOR] Phase out 
relay c++ components 
    * [#17655](https://github.com/apache/tvm/pull/17655) - Upgrading onnx and 
onnxrt verions
    * [#17657](https://github.com/apache/tvm/pull/17657) - Update argument 
order for relax.op.pad to make it round-trippable
    * [#17658](https://github.com/apache/tvm/pull/17658) - [REFACTOR] Phase out 
te.schedule python components 
    * [#17653](https://github.com/apache/tvm/pull/17653) - Update images to 
20250214-034537-bd1411f8
    * [#17656](https://github.com/apache/tvm/pull/17656) - [REFACTOR] Phase out 
relay python components
    * [#17649](https://github.com/apache/tvm/pull/17649) - [Refactor] Phase out 
python dependency attrs
    * [#17644](https://github.com/apache/tvm/pull/17644) - Bump rollup from 
2.79.1 to 2.79.2 in /web
    * [#17637](https://github.com/apache/tvm/pull/17637) - [PYTHON] Build 
cython by default
    * [#17631](https://github.com/apache/tvm/pull/17631) - Handle vector width 
(VLEN) for RISCV arches
    * [#17613](https://github.com/apache/tvm/pull/17613) - Bug Fix: Removed 
unused code
    * [#17585](https://github.com/apache/tvm/pull/17585) - [Relay]Disable 
InferType if it was done and no changes after previous pass
    * [#17605](https://github.com/apache/tvm/pull/17605) - [Refactor] Phase out 
legacy example apps
    * [#17603](https://github.com/apache/tvm/pull/17603) - [Refactor] Phase out 
legacy docs
    * [#17513](https://github.com/apache/tvm/pull/17513) - [GRAPH RT] 
Additional API support
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to