[I] [Release] v0.17.0 Release Candidate Notes [tvm]

via GitHub Fri, 19 Jul 2024 22:41:02 -0700


ysh329 opened a new issue, #17178:
URL: https://github.com/apache/tvm/issues/17178


   # Introduction
   
   The TVM community has worked since the v0.17.0 release to deliver the 
following new exciting improvements! This release version is:
   
   The main tags are below (**bold text is with lots of progress**):
   
   - Community, RFCs
   - Adreno, ArmComputeLibrary, Metal, cuda & cutlass & tensorrt, micoNPU, 
Runtime
   - **Relax**, **Dlight**, **Disco**
   - Arith, **TIR**, TVMScript
   - Docs, CI, **Misc**, **BugFix**
   
   Please visit the full listing of commits for a complete view: 
[v0.17.dev0...v0.17.0](https://github.com/apache/tvm/compare/v0.17.dev0...v0.17.0).
   
   ### Community
   
    * [#17018](https://github.com/apache/tvm/pull/17018) - New committer: 
Balint Cristian
   
    ### RFCs
   
   This new RFC added an open, standardized format for neural network exchange 
developed by the Khronos Group since 2018 (https://www.khronos.org/nnef). It is 
aimed at deploying trained neural networks from deep learning frameworks to 
proprietary inference engines of neural network hardware vendors.
   
    * [#108](https://github.com/apache/tvm-rfcs/pull/108) - [RFC] [RFC] Add 
NNEF frontend
   
   ----
   
   ### AOT
    * [#17077](https://github.com/apache/tvm/pull/17077) - Correctly calculate 
workspace for vector types
   
   ### Adreno
    * [#16927](https://github.com/apache/tvm/pull/16927) - [SCRIPT]Fix in build 
config for adreno
   
   ### BYOC
    * [#16895](https://github.com/apache/tvm/pull/16895) - Add layout check and 
update shape check for cublas FP8 BYOC
   
   ### BugFix
    * [#17138](https://github.com/apache/tvm/pull/17138) - [Fix][TIR] Fix 
outdated call to create extern buffer in make_extern
    * [#17132](https://github.com/apache/tvm/pull/17132) - Restrict CopyOnWrite 
to _type_final
    * [#17096](https://github.com/apache/tvm/pull/17096) - Update FAttrsGetter 
to return Map<String, ObjectRef>
    * [#17078](https://github.com/apache/tvm/pull/17078) - [NCCL] Release NCCL 
thread_local resources in destructor
    * [#17044](https://github.com/apache/tvm/pull/17044) - [Support] Fix copy 
constructor for support::OrderedSet
    * [#17000](https://github.com/apache/tvm/pull/17000) - [MSC] split 
name_string with index by colon from the right
    * [#16923](https://github.com/apache/tvm/pull/16923) - [Fix][Dlight] Fix 
GeneralReduction for log-sum-exp
    * [#16924](https://github.com/apache/tvm/pull/16924) - [Fix] Fix SSA 
conversion for SizeVar retention
    * [#16903](https://github.com/apache/tvm/pull/16903) - 
CudaDeviceAPI::GetAttr may check kExist when GPUs absent
    * [#16901](https://github.com/apache/tvm/pull/16901) - rocm shared memory 
issue on MI250
   
   ### CI
    * [#17055](https://github.com/apache/tvm/pull/17055) - [SME][Test] Add 
additional conv2d tests for asymmetric parameters
    * [#17007](https://github.com/apache/tvm/pull/17007) - [TOPI][Testing] 
Enable conv2d NHWC fp16 topi testing for `arm_cpu`
    * [#16930](https://github.com/apache/tvm/pull/16930) - [UnitTest] Use 
pytest's scope='session' for tvm.testing.parameter
    * [#16948](https://github.com/apache/tvm/pull/16948) - Update image tag to 
20240428-060115-0b09ed018
    * [#16931](https://github.com/apache/tvm/pull/16931) - Use LLVM17 for tests 
on `ci_cpu`
    * [#16942](https://github.com/apache/tvm/pull/16942) - Enable Conda setup v3
    * [#16939](https://github.com/apache/tvm/pull/16939) - Upgrade CUDA to 12.4
   
   ### CRT
    * [#17097](https://github.com/apache/tvm/pull/17097) - [Bugfix]Return error 
code on error from ModuleGetFunction
   
   ### Disco
    * [#17035](https://github.com/apache/tvm/pull/17035) - [QoL] Implement 
broadcast/scatter methods for Session
    * [#16992](https://github.com/apache/tvm/pull/16992) - [Bugfix]Handle 
NDArray larger than OS buffer for pipe
    * [#16978](https://github.com/apache/tvm/pull/16978) - Implement 
`num_workers` property for `disco.Session`
    * [#16989](https://github.com/apache/tvm/pull/16989) - Treat hangup of 
disco worker process as kShutdown
    * [#16993](https://github.com/apache/tvm/pull/16993) - Allow allocation 
that only exists on worker0
    * [#16979](https://github.com/apache/tvm/pull/16979) - Expose 
disco.Session.shutdown through the python API
    * [#16919](https://github.com/apache/tvm/pull/16919) - Improve error 
message for CallPacked
   
   ### Dlight
    * [#17082](https://github.com/apache/tvm/pull/17082) - Use 16x32 spatial x 
reduction thread extents in GEMV scheduling
    * [#17052](https://github.com/apache/tvm/pull/17052) - Skip GEMV rules when 
more than one vector
    * [#17026](https://github.com/apache/tvm/pull/17026) - Perf improvement for 
low_batch_gemv on Metal
    * [#17016](https://github.com/apache/tvm/pull/17016) - Update Adreno GEMV 
Rules
    * [#16972](https://github.com/apache/tvm/pull/16972) - [GPU] Enhance opencl 
thread limit for schedules
    * [#16973](https://github.com/apache/tvm/pull/16973) - [GPU] Improved gemv 
outer fallback schedule
    * [#16958](https://github.com/apache/tvm/pull/16958) - Check for target in 
function attributes
    * [#16894](https://github.com/apache/tvm/pull/16894) - Enhance 
vectorization for gpu matmul
    * [#16884](https://github.com/apache/tvm/pull/16884) - Add check for matmul 
dtype and fix reduction rule
   
   ### Docs
    * [#17146](https://github.com/apache/tvm/pull/17146) - [DOC] Fix typo for 
the "We utilize the intermediate representation of nn.Graph to convert the 
OneFlow model to Reley."
    * [#17015](https://github.com/apache/tvm/pull/17015) - [DOC] Update Model 
Links to Include Commit
   
   ### Frontend
    * [#17014](https://github.com/apache/tvm/pull/17014) - [ArgParse] Pass 
default values to target compiler(#13264)
    * [#16961](https://github.com/apache/tvm/pull/16961) - [Bugfix][ONNX] 
Improve broadcast and batch_matmul conversion
    * [#16936](https://github.com/apache/tvm/pull/16936) - [TFLite] Add support 
for GELU conversion
   
   ### Hexagon
    * [#17123](https://github.com/apache/tvm/pull/17123) - Add support for v75
   
   ### LLVM
    * [#17046](https://github.com/apache/tvm/pull/17046) - [Arith][SVE] Add 
rewrite rules for indices split by scalable expressions
    * [#16966](https://github.com/apache/tvm/pull/16966) - [SVE] Add support 
for representing and creating buffer-level predicates
    * [#17001](https://github.com/apache/tvm/pull/17001) - [SVE] Use only 
powers of two as possible vscale values
    * [#16962](https://github.com/apache/tvm/pull/16962) - [SVE] Add codegen 
support for `vscale_range()` function attribute
    * [#16968](https://github.com/apache/tvm/pull/16968) - Stringref API 
deprecation fixes
    * [#16965](https://github.com/apache/tvm/pull/16965) - [SVE] Add 
get_active_lane_mask builtin
    * [#16899](https://github.com/apache/tvm/pull/16899) - [SVE][TOPI] Add 
conv2d NHWC hybrid SVE schedule for `arm_cpu`
    * [#16893](https://github.com/apache/tvm/pull/16893) - [SVE] Check for SVE 
target in VectorizeLoop
    * [#16862](https://github.com/apache/tvm/pull/16862) - [SVE] Support 
splitting by vscale in `tir::split` and `te::split`
   
   ### MetaSchedule
    * [#17012](https://github.com/apache/tvm/pull/17012) - 
[BugFix]MultiLevelTilingTensorCore generates inconsistent thread-binding sketch 
for batched matmul
    * [#17066](https://github.com/apache/tvm/pull/17066) - [BugFix]Fix 
TensorIntrin ‘dot_4x4_i8i8s32_sdot’ is not registered
   
   ### Metal
    * [#17059](https://github.com/apache/tvm/pull/17059) - Enable Debug Label
    * [#17025](https://github.com/apache/tvm/pull/17025) - Support metal device 
profiling
   
   ### OpenCL & CLML
    * [#16933](https://github.com/apache/tvm/pull/16933) - [CLML] Fix in clml 
pattern check condition
    * [#16929](https://github.com/apache/tvm/pull/16929) - [VM][OPENCL] Take 
advantage of OpenCL host ptr for improved copy
   
   ### ROCm
    * [#17141](https://github.com/apache/tvm/pull/17141) - [Backend]Fix error 
when building TVM with LLVM 19
   
   ### Relax
    * [#17139](https://github.com/apache/tvm/pull/17139) - Fix cublas dispatch 
for corner cases
    * [#17127](https://github.com/apache/tvm/pull/17127) - [KVCache] Support 
fork in sliding window sink part
    * [#17115](https://github.com/apache/tvm/pull/17115) - Support 
`input_axis_separator` to allow 2D to 1D conversion
    * [#17119](https://github.com/apache/tvm/pull/17119) - [Bugfix]Set 
purity=false for LazySetOutput
    * [#17118](https://github.com/apache/tvm/pull/17118) - [VM] Improved error 
messages for mismatched parameter count
    * [#17110](https://github.com/apache/tvm/pull/17110) - Alloc BYOC workspace 
with R.builtin.alloc_tensor
    * [#17089](https://github.com/apache/tvm/pull/17089) - [ONNX] Add support 
for HardSigmoid
    * [#17100](https://github.com/apache/tvm/pull/17100) -  [KVCache] Unlimited 
depth blocks
    * [#17075](https://github.com/apache/tvm/pull/17075) - [Transform] Modify 
FuseTIR pass to propagate buffer attributes
    * [#17088](https://github.com/apache/tvm/pull/17088) - [ONNX] Add support 
for HardSwish
    * [#17085](https://github.com/apache/tvm/pull/17085) - [PyTorch] Add 
support for torch.nn.Hardsigmoid
    * [#17083](https://github.com/apache/tvm/pull/17083) - [TVMScript]Preserve 
tir.SizeVar through TVMScript round-trip
    * [#17086](https://github.com/apache/tvm/pull/17086) - Ignore dynamic 
parameters in RewriteDataflowReshape
    * [#17084](https://github.com/apache/tvm/pull/17084) - [PyTorch] Add 
support for torch.nn.Hardswish
    * [#17074](https://github.com/apache/tvm/pull/17074) - [KVCache][Test] Fix 
TIR attn kernels for uncommon group size
    * [#17067](https://github.com/apache/tvm/pull/17067) - Add missing white 
spaces in error messages
    * [#17061](https://github.com/apache/tvm/pull/17061) - [Frontend][Onnx] 
Cast Op special handling for ShapeExpr input
    * [#17033](https://github.com/apache/tvm/pull/17033) - [Bugfix] Apply 
FuseOps to nested DataflowBlock
    * [#17032](https://github.com/apache/tvm/pull/17032) - [Bugfix] Annotate 
ComputePrimValue output as host function
    * [#17034](https://github.com/apache/tvm/pull/17034) - [Bugfix] Bind 
symbolic variables in R.match_cast
    * [#16960](https://github.com/apache/tvm/pull/16960) -  [UnitTest] Validate 
IRModule with multiple targets
    * [#16995](https://github.com/apache/tvm/pull/16995) - [KVCache] Support 
KVCache decode from forked sequence and pop more tokens
    * [#16959](https://github.com/apache/tvm/pull/16959) - [Transform] Handle 
identical PrimFunc with distinct VDevice
    * [#16589](https://github.com/apache/tvm/pull/16589) - [Unity] Check for 
transpose and dynamic shape in AdjustMatmulOrder
    * [#16988](https://github.com/apache/tvm/pull/16988) - [KVCache] Fix the 
aux data syncing order of paged KV cache
    * [#16922](https://github.com/apache/tvm/pull/16922) - [BugFix]change 
FuseOpsByPattern strategy to pattern-match maximal subgraph
    * [#16982](https://github.com/apache/tvm/pull/16982) - [Unity][BYOC] Use 
arith.Analyzer to check batch equality of matmul in cublas
    * [#16955](https://github.com/apache/tvm/pull/16955) - Implement 
relax.op.view
    * [#16971](https://github.com/apache/tvm/pull/16971) - Support nested 
ModuleList in nn.Module
    * [#16826](https://github.com/apache/tvm/pull/16826) - Express dynamic 
arguments of strided_slice as arguments
    * [#16476](https://github.com/apache/tvm/pull/16476) - [Unity][Cutlass] Fix 
C source generation of dense operation
    * [#16940](https://github.com/apache/tvm/pull/16940) - Allow PrimValue as 
index in relax.op.take
    * [#16934](https://github.com/apache/tvm/pull/16934) - [TIR] Introduce new 
`cumsum` op for gpu
    * [#16859](https://github.com/apache/tvm/pull/16859) - [QoL]Use SeqExpr in 
IR types when SeqExpr is required
    * [#16904](https://github.com/apache/tvm/pull/16904) - Prevent to generate 
duplicate func in dispatch_sort_scan
    * [#16905](https://github.com/apache/tvm/pull/16905) - [Bugfix]Raise 
exception for OOM allocation
    * [#16827](https://github.com/apache/tvm/pull/16827) - Handle binary 
operations between Tensor and PrimValue
    * [#16902](https://github.com/apache/tvm/pull/16902) - Allow specifying 
entry_funcs for BYOC
    * [#16860](https://github.com/apache/tvm/pull/16860) - [QoL]Infer 
StructInfo for relax::Tuple on construction
    * [#16861](https://github.com/apache/tvm/pull/16861) - [QoL]Return 
well-formed IR from relax::Function::CreateEmpty
    * [#16886](https://github.com/apache/tvm/pull/16886) - [Frontend] Fix sort, 
argsort and topk in nn module
    * [#16883](https://github.com/apache/tvm/pull/16883) - Stabilize relax pass 
mutation order
   
   ### Relay
    * [#16983](https://github.com/apache/tvm/pull/16983) - [BugFix]skip leaf 
args when matching 'path' part for dominator pattern
    * [#16996](https://github.com/apache/tvm/pull/16996) - fixed to make 
TupleGetItem inherits the previous span
   
   ### Runtime
    * [#17057](https://github.com/apache/tvm/pull/17057) - Stateless interface 
of PagedKVCache leaf node commit
    * [#17049](https://github.com/apache/tvm/pull/17049) - Support PagedKVCache 
with tree attention
    * [#17045](https://github.com/apache/tvm/pull/17045) - Fix PagedKVCache for 
PopN and enhance tests
    * [#16998](https://github.com/apache/tvm/pull/16998) - Compatibility with 
dmlc::Stream API changes
    * [#17037](https://github.com/apache/tvm/pull/17037) - [ROCm] Enable ROCm 
host memory support
    * [#17036](https://github.com/apache/tvm/pull/17036) - Use preferred host 
memory (pinned memory) in KV cache
    * [#16994](https://github.com/apache/tvm/pull/16994) - Allow query of 
available device memory through DeviceAPI
    * [#16997](https://github.com/apache/tvm/pull/16997) - [Disco] Restore 
checks for hangup of disco pipe
    * [#16938](https://github.com/apache/tvm/pull/16938) - Allow offset to be 
specified in NDArray::CreateView
    * [#16890](https://github.com/apache/tvm/pull/16890) - [VULKAN] Support 
total_global_memory
    * [#16880](https://github.com/apache/tvm/pull/16880) - Implemented 
Datatype.itemsize()
   
   ### TIR
    * [#17134](https://github.com/apache/tvm/pull/17134) - [Schedule] Remove 
`@type_check` for `set_axis_separator`
    * [#17112](https://github.com/apache/tvm/pull/17112) - [DLight] Enable 
SimdGroup op for Metal
    * [#17098](https://github.com/apache/tvm/pull/17098) - [RPC] Allow RPC 
calls to compiled PrimFuncs with no arguments
    * [#17039](https://github.com/apache/tvm/pull/17039) - Fix Bug in 
VectorizeLoop
    * [#17030](https://github.com/apache/tvm/pull/17030) - Fix Shuffle rewrite
    * [#16947](https://github.com/apache/tvm/pull/16947) - Support narrow dtype 
for let binding
    * [#16952](https://github.com/apache/tvm/pull/16952) - Enhance CLZ 
intrinsic support
    * [#16945](https://github.com/apache/tvm/pull/16945) - [Compute-at] Make 
compute-ated block simple when the predicate could be merged
    * [#16879](https://github.com/apache/tvm/pull/16879) - Make T.reinterpret 
nop when dtype is the same
   
   ### TOPI
    * [#17091](https://github.com/apache/tvm/pull/17091) - Add dense schedule 
for fp16 and fp32 using gemm
    * [#17048](https://github.com/apache/tvm/pull/17048) - [SME]Add conv2d NHWC 
SME fp16->fp32 schedule
    * [#17040](https://github.com/apache/tvm/pull/17040) - Fix SME conv2d 
schedule import and intrin argument
    * [#17003](https://github.com/apache/tvm/pull/17003) - [SME]Add conv2d NHWC 
SME fp32 schedule
    * [#16977](https://github.com/apache/tvm/pull/16977) - Remove `blockIdx.z` 
in topi sort
    * [#16951](https://github.com/apache/tvm/pull/16951) - Revert unification 
of conv2d NHWC hybrid scheduling for `arm_cpu` targets
   
   ### TVMScript
    * [#17107](https://github.com/apache/tvm/pull/17107) - Better Type 
Annotation for TIR OP
    * [#16967](https://github.com/apache/tvm/pull/16967) - Fix error reporting 
inside Macro func
    * [#16916](https://github.com/apache/tvm/pull/16916) - Support 
`T.launch_thread` with i64 dtype
    * [#16876](https://github.com/apache/tvm/pull/16876) - Optionally use `ruff 
format` instead of `black`
    * [#16877](https://github.com/apache/tvm/pull/16877) - [Bug] Add test case 
for missing symbolic bounds
   
   ### cuda & cutlass & tensorrt
    * [#16980](https://github.com/apache/tvm/pull/16980) - [Cuda] Skip 
FreeDataSpace when CUDA driver is in inconsistent state
   
   ### web
    * [#17031](https://github.com/apache/tvm/pull/17031) - Fix string to uint8 
array for special characters
    * [#17028](https://github.com/apache/tvm/pull/17028) - Add dtype and offset 
for CreateView in runtime
    * [#16910](https://github.com/apache/tvm/pull/16910) - Support string[] in 
setPackedFunc() and exceptionally long arrays
   
   ### Misc
    * [#17135](https://github.com/apache/tvm/pull/17135) - [QoL][IR] Provide 
default constructor for NameSupply/GlobalVarSupply
    * [#17125](https://github.com/apache/tvm/pull/17125) - [Utils] Define 
line-length for "ruff format"
    * [#17152](https://github.com/apache/tvm/pull/17152) - GraphExecutor: Fix 
wild pointer assign when input and output are reshape
    * [#17150](https://github.com/apache/tvm/pull/17150) - [WebGPU] Fall back 
to 256MB for maxBufferSize if needed
    * [#17128](https://github.com/apache/tvm/pull/17128) - [Compute-inline] 
Prefer T.where for reverse compute-inlined block with predicate
    * [#16976](https://github.com/apache/tvm/pull/16976) - [WebGPU] Implement 
`tir.dp4a` with WGSL built-in function `dot4I8Packed`
    * [#17124](https://github.com/apache/tvm/pull/17124) - [WebGPU] Add 
`tir.dp4a`
    * [#17113](https://github.com/apache/tvm/pull/17113) - [CudaGraph] Handle 
exceptions thrown while capturing cuda graph
    * [#17094](https://github.com/apache/tvm/pull/17094) - [Utility][Container] 
Support non-nullable types in Array::Map
    * [#17101](https://github.com/apache/tvm/pull/17101) - [RPC] Raise error if 
server process terminated
    * [#17092](https://github.com/apache/tvm/pull/17092) - [UnitTests] Use 
tvm.ir.assert_structural_equal whenever possible
    * [#17054](https://github.com/apache/tvm/pull/17054) - [SME] Utilize 
predication in fp32 matmul and conv2d schedules
    * [#17079](https://github.com/apache/tvm/pull/17079) - [CMake] Show NVCC 
include directories in compile_commands.json
    * [#17076](https://github.com/apache/tvm/pull/17076) - [SME] Extract gemm 
block correctly when fused with bias
    * [#17071](https://github.com/apache/tvm/pull/17071) - [WebGPU] Translate 
`int8x4` into `u32`
    * [#17065](https://github.com/apache/tvm/pull/17065) - [FP8][Codegen] Add 
make_fp8 vector constructors
    * [#17064](https://github.com/apache/tvm/pull/17064) - Add docs of v0.15.0 
and v0.16.0
    * [#16985](https://github.com/apache/tvm/pull/16985) - [CODEGEN] 
Vector-Codegen support for llvm-pure-intrin
    * [#17058](https://github.com/apache/tvm/pull/17058) - Introduce outer 
reduction for metal
    * [#17051](https://github.com/apache/tvm/pull/17051) - Use adapter.info 
when available instead of requestAdapterInfo
    * [#16981](https://github.com/apache/tvm/pull/16981) - [SME] Add scalable 
fp16->fp32 dense schedule
    * [#17029](https://github.com/apache/tvm/pull/17029) - [Contrib] Implement 
NDArray cache update
    * [#17027](https://github.com/apache/tvm/pull/17027) - [picojson] Let 
objects be ordered when serializing
    * [#17021](https://github.com/apache/tvm/pull/17021) - [WebGPU] Update 
error messages to be more user-friendly
    * [#17010](https://github.com/apache/tvm/pull/17010) - Support 
multinomial_from_uniform dispatch
    * [#16999](https://github.com/apache/tvm/pull/16999) - [USMP] add missing 
const specifier for global_const_workspace
    * [#17005](https://github.com/apache/tvm/pull/17005) - [WebGPU] Handle 
device OOM in createBuffer
    * [#16921](https://github.com/apache/tvm/pull/16921) - [SME] Introduce 
scalable fp32 dense schedule
    * [#16957](https://github.com/apache/tvm/pull/16957) - chore: remove 
repetitive words
    * [#16909](https://github.com/apache/tvm/pull/16909) - [QoL][IR] Provide 
std::hash and std::equal_to for IR Variable types
    * [#16987](https://github.com/apache/tvm/pull/16987) - [JVM] Automatic 
Compatibility of JVM AttachCurrentThread
    * [#16974](https://github.com/apache/tvm/pull/16974) - [CUBLAS][FP8] Enable 
R.matmul + R.multiply offloading
    * [#16896](https://github.com/apache/tvm/pull/16896) - [CUBLAS] Enable 
offloading of R.matmul + R.dequantize
    * [#16956](https://github.com/apache/tvm/pull/16956) - Add script for 
testing release package
    * [#16908](https://github.com/apache/tvm/pull/16908) - Overriding the 
StructuralEqual() for easy usage
    * [#16932](https://github.com/apache/tvm/pull/16932) - Enable gemv schedule 
for adreno
    * [#16935](https://github.com/apache/tvm/pull/16935) - [3rdparty] Bump 
FlashInfer for sampling functions
    * [#16937](https://github.com/apache/tvm/pull/16937) - [Thrust] Increase 
static workspace size
    * [#16915](https://github.com/apache/tvm/pull/16915) - [Marvell BYOC]: 
Marvell AI Accelerator Integration - Phase 2
    * [#16741](https://github.com/apache/tvm/pull/16741) - Restore 
"pytest.mark.gpu" for RELAX tests
    * [#16914](https://github.com/apache/tvm/pull/16914) - [CMAKE] Make 
LOG_BEFORE_THROW explicit
    * [#16913](https://github.com/apache/tvm/pull/16913) - Enhance Release Note 
Script and Remove Useless File
    * [#16907](https://github.com/apache/tvm/pull/16907) - [Upd] Fixed lld 
search in rocm
    * [#16900](https://github.com/apache/tvm/pull/16900) - [CMAKE] Misc 
improvment of Util
    * [#16897](https://github.com/apache/tvm/pull/16897) - [Target] Don't 
register AArch64 target tags without LLVM compiler support
    * [#16892](https://github.com/apache/tvm/pull/16892) - [CUBLAS] Set fp32 
compute and scale dtypes in fp16 matmul
    * [#16888](https://github.com/apache/tvm/pull/16888) - [CUBLAS][FP8] 
Support e4m3 gemm in cuBLAS BYOC
    * [#16887](https://github.com/apache/tvm/pull/16887) - [Contrib] Enable 
fp16 for thrust sort
    * [#16881](https://github.com/apache/tvm/pull/16881) - [release][Dont 
Squash] Update version to 0.16.0 and 0.17.0.dev on main branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Release] v0.17.0 Release Candidate Notes [tvm]

Reply via email to