This is an automated email from the ASF dual-hosted git repository.

ruihangl pushed a change to branch unity-staging
in repository https://gitbox.apache.org/repos/asf/tvm.git


    from 2dbab710e8 Merge branch 'main' into 'unity'
     add 44d80e5107 [Unity][Bugfix] Reset window cache current pos when 
clearing (#16132)
     add 8a2ffee38e [Unity] Do not import SciPy by default (#16136)
     add 90edf76716 [Unity][LLM] Add NaN checks during sampling for better 
error reporting (#16141)
     add 5e61adc0a3 [Unity][DistIR] Enhance PropagateSharding pass (#16094)
     add 165b84bc7f Always use int64 in JSON parser (#16145)
     add 4c07f6af43 [Runtime] Introduce Type-Checked `TVMArgs::At<T>(i)` 
(#16147)
     add 4e70c28217 [Runtime] Allowing Packed Arguments in TVM Module VTable 
(#16148)
     add 29450b927e [Unity][MSC] Enable add attributes while fuse ops (#16128)
     add bafd49d4b8 [Unity] Flash infer integration (#16146)
     add 9a985714f3 [Unity] Migrate Relax Executable/VM to `TVM_MODULE_VTABLE` 
Convention (#16149)
     add 925cb2bbaa [Unity][BYOC] Add cutlass finegrained decode matmul (#16144)
     add aae1112a65 [Unity] Support constant args in `nn.ExternModule` (#16130)
     add 756ce9917f [Unity][3rdparty] Remove TVM in 3rdparty of FlashInfer 
(#16155)
     add 1de8b347d1 [Unity][DistIR] LowerGlobalViewToLocalView (#16095)
     add 2dcb8716e8 [Unity][BlockBuilder] Depracate `BlockBuilder.get()` and 
change it to `BlockBuilder.finalize()` (#16090)
     add 8f24a272a0 [Unity][MSC][M2.1] Add Manager for compile pipeline (#16163)
     add af803cf7b4 [Unity][DLight] Fix `general_reduction` for GroupNorm 
(#16161)
     add 64fe5a8a89 [Unity][DistIR] Add DTensor struct info propagation rule 
for stop_lift_params (#16170)
     add c640d0a3c9 [Unity][Web] Fix missing function NVTXScopedRange for web 
(#16177)
     add 8a6184ccfa [Unity, BYOC] Add check for leaking intemediate variables 
for cublas and cudnn (#16175)
     add a6adaae5ef [Unity][DistIR] LowerDistIR (#16169)
     add 85389efa2c [Unity][BYOC] Fix Flash var_len attention with sliding 
window (#16185)
     add 68443482c9 [Unity][Bugfix] Handle symbolic matching with 
non-structural match (#15994)
     add d52a9bf388 [Unity][Transform] Implement RemoveUnusedOutputs (#16117)
     add d6015c5643 [Unity][BugFix] Fix a bug in relax gelu_tanh computation 
(#16188)
     add fe9d2fe57d [Unity][Transform] Implement ExpandTupleArguments (#16115)
     add fc324d0f2c [Unity][Transform] Implement RemoveUnusedParameters (#16116)
     add 74667b97f0 [Unity] Enable ccache for `nn.SourceModule` (#16189)
     add ed2772f9c8 [Unity][MSC][M2.1] Add pruner for model pruning (#16186)
     add 9e4e17ca88 [Unity][WebGPU] Get params from cache by name (#16198)
     add a2f55a8812 [WEBGPU] Update to latest compilationHints API (#16197)
     add 8f95f6147a [Unity] [Transform] Remove iteration over functions in 
function pass (#16173)
     add 3c7067d6ed [Unity] Minor: Remove debug logging (#16200)
     add 34fd234f55 [Unity] Check usage location when canonicalizing trivial 
bindings (#16193)
     add 4e8c975700 [Unity][Bugfix] Fix 
`tests/python/topi/test_topi_transform.py::test_relax_dynamic_strided_slice` 
(#16205)
     add d0504027bb [Unity] Update FlashInfer (#16208)
     add ebbad09cd5 [Unity] Upgrade cutlass_fpA_intB_gemm (#16206)
     add 03fc4f6f03 [Dlight] Change max_threads on CUDA (#16203)
     add 58e622b74d [Unity][Transform] Implement Relax function inlining 
(#16194)
     add e0518da2a5 [Unity][MSC][M2.3] Add tracker for track layer datas 
(#16207)
     add 35e8404f17 [Disco] Expose `DiscoWorker` and `ndarray_cache_support` in 
header (#16153)
     add f18d186559 [Unity] Speed up NormalizeGlobalVar (#16219)
     add b5b980e33a [Unity] Support out dtype for nn.Linear and nn.MultiLinear 
(#16220)
     add 8241385f59 [Unity] De-duplicate calls to TensorStructInfo constructor 
(#16209)
     add 2772fb072a [Unity] Fix upstream tests that fail on unity branch 
(#16196)
     add c6d4926529 [Dlight] Fix NormalizePrimFunc with scalar block (#16156)
     add af14fbbbe1 [Relax] Fix to enable emit_te of topi scan/sort kernels 
(#16226)
     add 943508a295 [Unity] Fix typo in dlight fallback rule (#16230)
     add cbcb67c047 [Unity][Frontend] Add the `sum` op to frontend ops (#16225)
     add fe89ccc360 [Unity][Transform] Pass for automatically extracting 
DataflowBlocks (#16204)
     add f7b0193f9d [Unity] Fix IndexDataTypeNormalizer so that it correctly 
handles corner case (#16235)
     add e100a13737 [Unity] Fix legalizing strided slice (#16232)
     add 674167805c Revert "[Unity] Fix IndexDataTypeNormalizer so that it 
correctly handles corner case" (#16241)
     add 6118b770b1 [Unity] Improved error checking for DataflowBlock in nested 
SeqExpr (#16195)
     add e1964eceb5 [Unity] Add runtime debugging method to RelaxVM (#16238)
     add cd9445d63b [Unity][lm_support] window kvcache sink (#16240)
     add a2e19d21eb [Unity] Fix IndexDataTypeNormalizer so that it correctly 
handles corner case (#16245)
     add 8edfee8574 [Unity][MSC][M2.4] Add quantizer for quantize model (#16228)
     add 5b1fa29838 [Unity][VM] Allow `pipeline=None` in `relax.build` (#16246)
     add f794db4373 [Unity] Avoid to use `std::regex` (#16249)
     add 2d0d4e46a9 [Unity] Enable spot nodes in CI (#16253)
     add a0e58987b0 [Unity][nn.Module] Refactor `ExternModule` (#16247)
     add 76e239e2e1 [Unity] Fix Cutlass Codegen for Dense (#16252)
     add 95f1b5c0e8 [Unity] Hot Fix Unity CI (#16256)
     add e98fdea654 [Unity] Bump fpA_intB_gemm (#16244)
     add 4e66690a4d [Fix] add TVM_DLL to disco functions (#16258)
     add 45eeb8c838 [Unity] Fix ccache env for `nn.SourceModule` (#16257)
     add f328e9bde3 [Unity] Add missing library import  (#16263)
     add 1c35c39264 [Unity] Add Relax multi-device e2e cases  (#15823)
     add 3de5e865df [Unity][nn.Module] Support Runtime-Calling Any PackedFunc 
via `op.extern` (#16274)
     add 5c8caa6e35 [Unity] Unified KV cache interface and PagedKVCache 
refactor (#16273)
     add 2bf3a0a428 [Unity][MSC][M3.1] Add distiller for distill model (#16264)
     add 889d2f6cef [Unity][Frontend] NNModule `tensor_ir_op` support (#16278)
     add 8946efa62e Update FlashInfer (#16281)
     add 58daeb4905 Update FlashInfer (#16292)
     add 303afdbccc [Unity][MSC][M3.2] Add gym for pruning and quantization, 
enable auto prune/quantize (#16280)
     add 2f7e0d578f [Unity] Ensure memory planning cross-function independence 
(#16318)
     add beb832616d [Unity] Update cutlass FpA IntB GeMM submodule (#16320)
     add 8867de843d [Unity][MSC][Bugfix] Use random workspace for test (#16322)
     add 9030522960 [Unity][Frontend] Introducing Object (#16316)
     add b1df4b0856 [Unity][Web][Fix] Fix fetchNDArray for f32-to-bf16 (#16294)
     add faa8a0ad46 [Unity][nn.Module] Introduce operator `empty` (#16327)
     add ac568eb30a [Unity] Fix PagedKVCache per FlashInfer update (#16317)
     add 09c44e6a93 [Unity] Upgrade flashinfer 3rdparty submodule (#16323)
     add 163c7ac436 [Unity] Cutlass kernel compatibility with cmake 3.18+ 
(#16302)
     add b3f0e55f24 Change metal dtype of ceil_log2 to fp32 (#16332)
     add 4a7e4fec37 [Unity] Fix nn.op.tensor_ir_op signature (#16333)
     add 1af82ad666 [Unity] Validate struct info in relax::Call constructor 
(#16311)
     add ec542da8cf [Unity][Transform] Extract partial-tuple-usage from FuseTIR 
(#16120)
     add 6f2fe457c2 [Unity][UnitTest] Increase atol to resolve flaky CI failure 
(#16340)
     add 0cf5f47a1e [Unity] Dispatch cumsum and sort (#16254)
     add 7dfc863df8 [Unity] Alter op impl handling empty transform for output 
(#16331)
     add d88cc4267d [Unity][Transform] Implement UpdateParamStructInfo (#16305)
     add d509661f89 [Unity][Analysis] Handle PrimStructInfo in 
EraseToWellDefined (#16304)
     add 49fc613a3c [Unity][WEBGPU] Enable wasm exception propagation (#16330)
     add c3aa71a53e [Unity][Analysis] Add utility for collecting compile-time 
bindings (#16312)
     add 8d72091b27 [DLight] Skip rule if target is not suitable (#16321)
     add 31659b617f [Unity][Dlight] Support dlight gemv rule on nested inner 
block (#16251)
     add fe5f6163e6 [Unity][MSC][Legalize] legalize codes and mute logging 
(#16325)
     add f215a417bf [Unity][NN] Use Linear name for nn.op.permute_dims (#16303)
     add ded4be4b39 enhance shared memory merge.
     add 6b6419c55f merge from unity upstream
     add 047211f1da revert the change for dyanmic test
     add 76ceff2c5e fix typo
     add e3216a6972 lint fix
     add 3fad0109bf [Unity][Contrib] Add vLLM paged attention kernel (#16350)
     add 3190f284a3 [DOC] Add v0.14.0 docs to site (#16152)
     add bce82432f9 [Relay][Pytorch] Add support for `aten::unflatten` (#16131)
     add 1a2cc18091 [Relay] conv3d depthwise bug fix (#16151)
     add f38dc146e4 [TOPI][Relay] Add conv2d NHWC hybrid schedule for `arm_cpu` 
(#16106)
     add 26aeaee046 [Community] Shuai Yuan -> Committers (#16162)
     add 3fd3a63652 [Relay][Pytorch] Add support for `aten::linalg_vector_norm` 
(#16123)
     add e5c6f74460 [CI][ADRENO] Enhancements to Adreno specific CI utils 
(#15991)
     add f33722e59a [Community] Ruihang Lai -> PMC (#16165)
     add d9d3bc585f [Community] Bohan Hou -> PMC (#16166)
     add 79052574be [Community] Qiang Zhang -> Committer (#16164)
     add 3eec10f16a [COMMUNITY] New Reviewer: Yixin Dong (#16172)
     add 604b263dd5 [BugFix] [Relay][Pytorch] Fix missing `.dtype`  (#16167)
     add 3136ff4bb6 [FRONTEND][KERAS] Fix bug concat convert for NCHW (#16159)
     add 97ddd667c8 [Relay][Pytorch] Add support for 
`aten::scaled_dot_product_attention` (#16143)
     add 1994f402e6 Enable ccache to accelerate contrib compilation (#16176)
     add e9a3b60f49 [Relay][Pytorch] Fix bug when converting models with 
torch.nn.ParameterList (#16180)
     add 2eb17fa87f [Device][Metal] Fix metal warp size (#16192)
     add 37329bf8c3 [BugFix] Fix the error of reloading the model library on 
the ROCm platform: "MIOpen Error: No invoker was registered for convolution 
forward.” (#16190)
     add 71081a8616 Fix IRModule initialization with attrs (#16202)
     add fe27973da0 Bump cryptography from 37.0.2 to 41.0.6 in /docker/python 
(#16174)
     add bf071dea54 [Relay][Frontend] Preserve Pytorch Span Names (#16171)
     add 65121c878a [Relay][Frontend] Add support for aten::concat (#16199)
     add a9fcac1a47 [Python] Fix setup.py for inplace build (#16214)
     add a59db03777 remove deprecated np.int in slice converter (pytorch) 
(#16221)
     add b0e146f767 [TIR] ConvertSSA process entry func first (#16236)
     add c8bfdb21ab [BugFix][TIR] Fix dynamic smem merge leaf alloc (#16216)
     add 37c38acc46 [Target] Add Jetson AGX Orin tags (#16231)
     add b3eec91ee6 [TFLite] Add support for quantized mirror pad (#16243)
     add 870246a369 [LoopPartition] Fix a bug of LoopPartition in single point 
scenarioes (#16104)
     add 799e81036d [ARITH] Simplify nested if_then_else when constant is 
appearing in then_expr (#16227)
     add 3df798d422 [Relay][TOPI] Add support for group_conv1d_transpose_ncw 
for generic (#16248)
     add 943861c85c [TIR][Schedule] TileWithTensorIntrin skip incorrect 
ComputeInline for input-padding (#16239)
     add 759ee1236a [Support] Add Interrupt Handling in Pipe (#16255)
     add 3a57a40c1b [RUNTIME][CLML] Fix for CLML ops and enable more test case 
(#15896)
     add f36a093c20 Update conv2d.py (#16262)
     add d310d7db59 [TOPI] Add support for group_conv3d_transpose_ncdhw for 
generic (#16259)
     add a906504389 [TVMScript] Disable concise scoping when the scope stmt is 
explicitly annotated (#16271)
     add a050696ca5 replace deprecated np.int with int to avoid crash (#16279)
     add 1c4538947b [BugFix] Fixed Inappropriate Logical Expression (#16272)
     add 506eff23b0 [Relay][Frontend][QNN] fix access `param_debug_name_map` to 
node output name in fx-quantized graph node replacement (#16217)
     add 2da3798dd1 [Relay][Frontend][Torch] add aten:broadcast_to  (#16319)
     add 8b2cff8ab9 [Doc] Fix minor error in doc (Add an operator to Relay) 
(#16282)
     add c157ca1a52 [BugFix] Update pillow usage (#16269)
     add 7445b8723c [release] Update version to 0.15.0 on main branch
     add e69363ab1e [release] Update version to 0.16.dev0 on main branch
     add 97f6e6507f [CI] Upgrade cmake version to 3.24.0 (#16336)
     add 8eec0bff6b [TIR][Transform] Implement InlinePrivateFunctions (#16184)
     add eb15d04c3b [TIR] In SplitHostDevice, check for variables in thread 
extents (#16250)
     add 42b4f213a7 [Hexagon][UnitTest] Disable flaky quantization test (#16337)
     add 380147d059 [Doc] Fix minor error in "Expressions in Relay" (#16346)
     add 56bdcee750 [CMake] Use ccache as CMAKE_CUDA_COMPILER_LAUNCHER (#16341)
     add 5308739741 [TIR] Allow sync threads inside condition (#16345)
     add ae7d9dbe06 [Codegen] Fix if_then_else codegen (#16242)
     add e3d031bc7c [CMake][MSVC] Disable permissive mode for MSVC builds 
(#16343)
     add 51bdaec6e3 [Docker] Upgrade pip in i386 container (#16348)
     new b47280b1fa Merge branch 'main' into 'unity'

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .gitmodules                                        |    3 +
 3rdparty/cutlass_fpA_intB_gemm                     |    2 +-
 3rdparty/flashinfer                                |    1 +
 3rdparty/picojson/picojson.h                       |    5 +-
 CMakeLists.txt                                     |   29 +
 CONTRIBUTORS.md                                    |    8 +-
 ci/jenkins/unity_jenkinsfile.groovy                |  100 +-
 cmake/modules/CUDA.cmake                           |   22 +
 cmake/modules/contrib/CUTLASS.cmake                |    3 +
 .../modules/contrib/vllm.cmake                     |   15 +-
 cmake/utils/CCache.cmake                           |   23 +-
 conda/recipe/meta.yaml                             |    2 +-
 docker/install/ubuntu2004_install_python.sh        |    2 +-
 docker/install/ubuntu_install_cmake_source.sh      |    3 +-
 docker/python/bootstrap-requirements.txt           |   26 +-
 docs/conf.py                                       |    1 +
 docs/dev/how_to/relay_add_op.rst                   |    4 +-
 docs/reference/langref/relay_expr.rst              |    2 +-
 include/tvm/relax/analysis.h                       |   15 +
 .../op.cc => include/tvm/relax/attrs/sort.h        |   42 +-
 include/tvm/relax/block_builder.h                  |    9 +
 include/tvm/relax/distributed/axis_group_graph.h   |  181 +-
 include/tvm/relax/distributed/transform.h          |   13 +
 include/tvm/relax/struct_info.h                    |    4 +-
 include/tvm/relax/transform.h                      |   50 +-
 include/tvm/relay/attrs/nn.h                       |   10 +-
 include/tvm/runtime/c_runtime_api.h                |    2 +-
 {src => include/tvm}/runtime/disco/builtin.h       |   42 +-
 .../tvm/runtime/disco/disco_worker.h               |   73 +-
 include/tvm/runtime/disco/session.h                |   20 +-
 include/tvm/runtime/packed_func.h                  |   47 +-
 include/tvm/runtime/relax_vm/executable.h          |   24 +-
 .../tvm}/runtime/relax_vm/ndarray_cache_support.h  |   39 +-
 include/tvm/runtime/relax_vm/vm.h                  |    2 -
 include/tvm/tir/analysis.h                         |   29 +-
 include/tvm/tir/builtin.h                          |    6 +
 include/tvm/tir/transform.h                        |   11 +-
 include/tvm/topi/transform.h                       |   25 +-
 licenses/LICENSE.vllm.txt                          |  201 ++
 python/setup.py                                    |    7 +-
 python/tvm/_ffi/libinfo.py                         |    2 +-
 python/tvm/autotvm/tuner/__init__.py               |    7 +-
 python/tvm/contrib/cc.py                           |   52 +-
 python/tvm/contrib/cutlass/attention_operation.py  |    3 +-
 python/tvm/contrib/cutlass/build.py                |    4 +
 python/tvm/contrib/cutlass/gemm_operation.py       |   16 +-
 python/tvm/contrib/cutlass/gen_tensor_op.py        |   10 +-
 python/tvm/contrib/emcc.py                         |    1 +
 python/tvm/contrib/msc/core/codegen/sources.py     |   56 +-
 python/tvm/contrib/msc/core/frontend/translate.py  |   25 +-
 .../backend => contrib/msc/core/gym}/__init__.py   |    7 +-
 .../msc/core/gym/agent}/__init__.py                |    6 +-
 .../tvm/contrib/msc/core/gym/agent/base_agent.py   |  314 ++
 python/tvm/contrib/msc/core/gym/agent/method.py    |   80 +
 .../tvm/contrib/msc/core/gym/agent/search_agent.py |  180 ++
 .../msc/core/gym/control}/__init__.py              |    6 +-
 .../tvm/contrib/msc/core/gym/control/configer.py   |   97 +
 .../tvm/contrib/msc/core/gym/control/controller.py |  107 +
 .../tvm/contrib/msc/core/gym/control/namespace.py  |   28 +-
 python/tvm/contrib/msc/core/gym/control/service.py |  816 ++++++
 python/tvm/contrib/msc/core/gym/control/worker.py  |  216 ++
 .../msc/core/gym/environment}/__init__.py          |    7 +-
 .../contrib/msc/core/gym/environment/base_env.py   |  368 +++
 .../tvm/contrib/msc/core/gym/environment/method.py |  202 ++
 .../contrib/msc/core/gym/environment/prune_env.py  |   95 +
 .../msc/core/gym/environment/quantize_env.py       |   99 +
 python/tvm/contrib/msc/core/ir/graph.py            |  242 +-
 python/tvm/contrib/msc/core/runtime/runner.py      |  508 +++-
 .../backend => contrib/msc/core/tools}/__init__.py |   10 +-
 .../msc/core/tools/distill}/__init__.py            |    6 +-
 .../contrib/msc/core/tools/distill/distiller.py    |  261 ++
 .../tvm/contrib/msc/core/tools/distill/method.py   |   72 +
 python/tvm/contrib/msc/core/tools/execute.py       |  386 +++
 .../msc/core/tools/prune}/__init__.py              |    6 +-
 python/tvm/contrib/msc/core/tools/prune/method.py  |  118 +
 python/tvm/contrib/msc/core/tools/prune/pruner.py  |  522 ++++
 .../msc/core/tools/quantize}/__init__.py           |    6 +-
 .../tvm/contrib/msc/core/tools/quantize/method.py  |  472 ++++
 .../contrib/msc/core/tools/quantize/quantizer.py   |  249 ++
 python/tvm/contrib/msc/core/tools/tool.py          | 1499 ++++++++++
 .../msc/core/tools/track}/__init__.py              |    6 +-
 python/tvm/contrib/msc/core/tools/track/method.py  |  102 +
 python/tvm/contrib/msc/core/tools/track/tracker.py |  185 ++
 python/tvm/contrib/msc/core/transform/pattern.py   |  115 +-
 python/tvm/contrib/msc/core/utils/dataset.py       |  415 ++-
 python/tvm/contrib/msc/core/utils/expr.py          |   20 +
 python/tvm/contrib/msc/core/utils/file.py          |  119 +-
 python/tvm/contrib/msc/core/utils/info.py          |  161 +-
 python/tvm/contrib/msc/core/utils/log.py           |   25 +-
 python/tvm/contrib/msc/core/utils/message.py       |   71 +-
 python/tvm/contrib/msc/core/utils/namespace.py     |   14 +-
 python/tvm/contrib/msc/core/utils/register.py      |  306 +-
 .../msc/framework/tensorflow/runtime/runner.py     |   36 +-
 .../msc/framework/tensorflow/tools}/__init__.py    |    8 +-
 .../tensorflow/tools/distill}/__init__.py          |    5 +-
 .../tensorflow/tools/distill/distiller.py          |   55 +
 .../framework/tensorflow/tools/prune}/__init__.py  |    5 +-
 .../msc/framework/tensorflow/tools/prune/pruner.py |   55 +
 .../tensorflow/tools/quantize}/__init__.py         |    5 +-
 .../tensorflow/tools/quantize/quantizer.py         |   55 +
 .../framework/tensorflow/tools/track}/__init__.py  |    5 +-
 .../framework/tensorflow/tools/track/tracker.py    |   55 +
 .../msc/framework/tensorrt/codegen/codegen.py      |   59 +-
 .../msc/framework/tensorrt/codegen/sources.py      |  172 +-
 .../msc/framework/tensorrt/frontend/translate.py   |   38 +-
 .../msc/framework/tensorrt/runtime/runner.py       |   90 +-
 .../msc/framework/tensorrt/tools}/__init__.py      |    8 +-
 .../framework/tensorrt/tools/distill}/__init__.py  |    5 +-
 .../framework/tensorrt/tools/distill/distiller.py  |   55 +
 .../framework/tensorrt/tools/prune}/__init__.py    |    5 +-
 .../msc/framework/tensorrt/tools/prune/pruner.py   |   55 +
 .../framework/tensorrt/tools/quantize}/__init__.py |    6 +-
 .../framework/tensorrt/tools/quantize/method.py    |  149 +
 .../framework/tensorrt/tools/quantize/quantizer.py |  366 +++
 .../framework/tensorrt/tools/track}/__init__.py    |    5 +-
 .../msc/framework/tensorrt/tools/track/tracker.py  |  159 ++
 .../msc/framework/tensorrt/transform/pattern.py    |   83 +-
 .../msc/framework/torch/frontend/translate.py      |    2 +-
 .../contrib/msc/framework/torch/runtime/runner.py  |    2 +
 .../msc/framework/torch/tools}/__init__.py         |    8 +-
 .../msc/framework/torch/tools/distill}/__init__.py |    6 +-
 .../msc/framework/torch/tools/distill/distiller.py |  144 +
 .../msc/framework/torch/tools/distill/method.py    |  116 +
 .../msc/framework/torch/tools/prune}/__init__.py   |    5 +-
 .../msc/framework/torch/tools/prune/pruner.py      |   55 +
 .../framework/torch/tools/quantize}/__init__.py    |    6 +-
 .../msc/framework/torch/tools/quantize/method.py   |  237 ++
 .../framework/torch/tools/quantize/quantizer.py    |   55 +
 .../msc/framework/torch/tools/track}/__init__.py   |    5 +-
 .../msc/framework/torch/tools/track/tracker.py     |   55 +
 .../contrib/msc/framework/tvm/runtime/runner.py    |   34 +-
 .../msc/framework/tvm/tools}/__init__.py           |    8 +-
 .../msc/framework/tvm/tools/distill}/__init__.py   |    5 +-
 .../msc/framework/tvm/tools/distill/distiller.py   |   55 +
 .../msc/framework/tvm/tools/prune}/__init__.py     |    5 +-
 .../msc/framework/tvm/tools/prune/pruner.py        |   55 +
 .../msc/framework/tvm/tools/quantize}/__init__.py  |    6 +-
 .../msc/framework/tvm/tools/quantize/method.py     |  204 ++
 .../msc/framework/tvm/tools/quantize/quantizer.py  |  167 ++
 .../msc/framework/tvm/tools/track}/__init__.py     |    5 +-
 .../msc/framework/tvm/tools/track/tracker.py       |  155 +
 .../backend => contrib/msc/pipeline}/__init__.py   |    5 +-
 python/tvm/contrib/msc/pipeline/manager.py         |  927 ++++++
 python/tvm/dlight/base/__init__.py                 |    3 +-
 python/tvm/dlight/base/analysis.py                 |   28 +-
 python/tvm/dlight/base/schedule_rule.py            |   15 +
 .../transform/transform.py => dlight/gpu/base.py}  |   39 +-
 python/tvm/dlight/gpu/fallback.py                  |   19 +-
 python/tvm/dlight/gpu/gemv.py                      |   38 +-
 python/tvm/dlight/gpu/general_reduction.py         |   26 +-
 python/tvm/dlight/gpu/matmul.py                    |   15 +-
 python/tvm/dlight/gpu/reduction.py                 |   74 +-
 python/tvm/dlight/gpu/transpose.py                 |   11 +-
 python/tvm/dlight/gpu/utils.py                     |    2 +-
 python/tvm/driver/build_module.py                  |   27 +-
 python/tvm/ir/module.py                            |    2 +-
 python/tvm/relax/analysis/__init__.py              |    1 +
 python/tvm/relax/analysis/analysis.py              |   25 +
 python/tvm/relax/backend/__init__.py               |    1 +
 python/tvm/relax/backend/contrib/cublas.py         |    3 +
 python/tvm/relax/backend/contrib/cudnn.py          |    3 +
 python/tvm/relax/backend/contrib/cutlass.py        |   36 +-
 python/tvm/relax/backend/dispatch_sort_scan.py     |  102 +
 python/tvm/relax/backend/utils.py                  |   43 +
 python/tvm/relax/block_builder.py                  |   48 +-
 python/tvm/relax/distributed/transform/__init__.py |    7 +-
 .../tvm/relax/distributed/transform/transform.py   |   22 +
 python/tvm/relax/dpl/pattern.py                    |    4 +-
 python/tvm/relax/expr.py                           |    5 +
 python/tvm/relax/frontend/nn/__init__.py           |   14 +-
 python/tvm/relax/frontend/nn/_tensor_op.py         |   12 +
 python/tvm/relax/frontend/nn/core.py               |  370 +--
 python/tvm/relax/frontend/nn/exporter.py           |  316 +++
 python/tvm/relax/frontend/nn/extern.py             |  399 +++
 python/tvm/relax/frontend/nn/modules.py            |   83 +-
 python/tvm/relax/frontend/nn/op.py                 |  453 ++-
 python/tvm/relax/frontend/nn/spec.py               |  371 +--
 python/tvm/relax/ir/instrument.py                  |    2 +-
 python/tvm/relax/op/__init__.py                    |    1 +
 python/tvm/relax/op/distributed/__init__.py        |    7 +-
 python/tvm/relax/op/distributed/distributed.py     |   53 +-
 python/tvm/relax/op/op_attrs.py                    |    5 +
 .../transform/transform.py => op/sort.py}          |   36 +-
 python/tvm/relax/pipeline.py                       |    5 +-
 python/tvm/relax/transform/__init__.py             |    8 +
 .../tvm/relax/transform/attach_external_modules.py |   52 +
 python/tvm/relax/transform/legalize_ops/index.py   |   12 -
 python/tvm/relax/transform/legalize_ops/nn.py      |   14 +-
 .../relax/transform/optimize_layout_transform.py   |   37 +-
 .../relax/transform/remove_redundant_reshape.py    |   37 +-
 python/tvm/relax/transform/transform.py            |  108 +-
 python/tvm/relax/utils.py                          |   26 +-
 python/tvm/relax/vm_build.py                       |   81 +-
 python/tvm/relay/analysis/sparse_conv2d.py         |   11 +-
 python/tvm/relay/analysis/sparse_dense.py          |   11 +-
 python/tvm/relay/frontend/keras.py                 |    2 +-
 python/tvm/relay/frontend/pytorch.py               |  326 ++-
 python/tvm/relay/frontend/qnn_torch.py             |   11 +-
 python/tvm/relay/frontend/tensorflow_ops.py        |    6 +-
 python/tvm/relay/frontend/tflite.py                |    6 -
 python/tvm/relay/op/contrib/clml.py                |  118 +-
 python/tvm/relay/op/nn/_nn.py                      |    2 +-
 python/tvm/relay/op/nn/nn.py                       |   12 +-
 python/tvm/relay/op/strategy/arm_cpu.py            |  123 +-
 python/tvm/relay/op/strategy/generic.py            |   52 +-
 python/tvm/relay/qnn/op/legalizations.py           |   19 +-
 python/tvm/relay/testing/yolo_detection.py         |    6 +-
 python/tvm/runtime/disco/process_pool.py           |   16 +-
 python/tvm/runtime/disco/session.py                |    3 +-
 python/tvm/runtime/ndarray.py                      |   31 +-
 python/tvm/runtime/relax_vm.py                     |   19 +-
 .../tvm/script/ir_builder/relax/distributed/ir.py  |    1 +
 python/tvm/script/ir_builder/relax/ir.py           |    2 +
 python/tvm/script/ir_builder/tir/ir.py             |    2 +
 python/tvm/script/parser/relax/dist.py             |    1 +
 python/tvm/script/parser/relax/parser.py           |    3 +
 python/tvm/target/detect_target.py                 |   23 +-
 python/tvm/testing/utils.py                        |   25 +-
 python/tvm/tir/op.py                               |   17 +
 python/tvm/tir/transform/transform.py              |   17 +-
 python/tvm/topi/arm_cpu/arm_utils.py               |  105 +-
 python/tvm/topi/arm_cpu/conv2d.py                  |  111 +
 python/tvm/topi/arm_cpu/conv2d_alter_op.py         |   57 +-
 python/tvm/topi/arm_cpu/conv2d_gemm.py             |  345 ++-
 python/tvm/topi/arm_cpu/conv2d_int8.py             |   96 +-
 python/tvm/topi/arm_cpu/qnn_legalize.py            |    8 +-
 python/tvm/topi/cuda/sort.py                       |    1 -
 python/tvm/topi/cuda/sparse.py                     |    7 +-
 python/tvm/topi/generic/nn.py                      |   34 +
 python/tvm/topi/intel_graphics/conv2d.py           |    3 +-
 python/tvm/topi/math.py                            |    5 +-
 python/tvm/topi/nn/conv1d_transpose.py             |  142 +-
 python/tvm/topi/nn/conv2d.py                       |   30 +-
 python/tvm/topi/nn/conv3d_transpose.py             |   81 +-
 python/tvm/topi/testing/__init__.py                |    5 +-
 .../topi/testing/conv1d_transpose_ncw_python.py    |   12 +
 .../topi/testing/conv3d_transpose_ncdhw_python.py  |   40 +-
 rust/tvm/src/ir/module.rs                          |   14 +-
 src/arith/ir_mutator_with_analyzer.cc              |    5 +-
 src/arith/ir_visitor_with_analyzer.cc              |    2 +-
 src/contrib/msc/core/codegen/base_codegen.h        |   24 +-
 src/contrib/msc/core/codegen/code_stack.cc         |  198 +-
 src/contrib/msc/core/codegen/code_stack.h          |  495 ++--
 src/contrib/msc/core/codegen/codegen_utils.cc      |    2 +-
 src/contrib/msc/core/codegen/codegen_utils.h       |   75 +-
 src/contrib/msc/core/codegen/cpp_codegen.h         |   90 +-
 src/contrib/msc/core/codegen/py_codegen.h          |   89 +-
 src/contrib/msc/core/ir/graph.cc                   |  447 ++-
 src/contrib/msc/core/ir/graph.h                    |  201 +-
 src/contrib/msc/core/printer/cpp_printer.cc        |  139 +-
 src/contrib/msc/core/printer/cpp_printer.h         |   47 +-
 src/contrib/msc/core/printer/msc_base_printer.cc   |    8 +
 src/contrib/msc/core/printer/msc_base_printer.h    |   14 +
 src/contrib/msc/core/printer/msc_doc.cc            |   34 +
 src/contrib/msc/core/printer/msc_doc.h             |  186 +-
 src/contrib/msc/core/printer/print_utils.cc        |   43 +-
 src/contrib/msc/core/printer/print_utils.h         |  134 +-
 src/contrib/msc/core/printer/python_printer.cc     |   70 +-
 src/contrib/msc/core/printer/python_printer.h      |   11 +-
 src/contrib/msc/core/transform/fuse_tuple.cc       |    2 +
 src/contrib/msc/core/transform/layout_utils.cc     |   26 +
 src/contrib/msc/core/transform/layout_utils.h      |   21 +
 src/contrib/msc/core/transform/set_expr_layout.cc  |   78 +-
 src/contrib/msc/core/transform/set_expr_name.cc    |  110 +-
 src/contrib/msc/core/utils.cc                      |   35 +-
 src/contrib/msc/core/utils.h                       |   38 +
 src/contrib/msc/framework/tensorflow/codegen.cc    |   33 +-
 .../msc/framework/tensorflow/tf_v1_opcode.cc       |   49 +-
 src/contrib/msc/framework/tensorrt/codegen.cc      |  196 +-
 src/contrib/msc/framework/tensorrt/codegen.h       |   10 +
 src/contrib/msc/framework/tensorrt/codegen_utils.h |   17 +-
 .../msc/framework/tensorrt/tensorrt_opcode.cc      |   50 +-
 .../msc/framework/tensorrt/tensorrt_opcode.h       |    2 +-
 src/contrib/msc/framework/torch/codegen.cc         |   34 +-
 src/contrib/msc/framework/torch/codegen_utils.h    |    4 +-
 src/contrib/msc/framework/torch/torch_opcode.cc    |   67 +-
 src/contrib/msc/framework/torch/torch_opcode.h     |    5 +-
 src/contrib/msc/framework/tvm/codegen.cc           |   76 +-
 src/contrib/msc/framework/tvm/relax_opcode.cc      |   51 +-
 src/contrib/msc/framework/tvm/relax_opcode.h       |    2 +-
 src/driver/driver_api.cc                           |    3 +-
 src/ir/module.cc                                   |   18 +-
 src/meta_schedule/postproc/verify_gpu_code.cc      |    2 +-
 src/node/script_printer.cc                         |   15 +-
 src/relax/analysis/computable_at_compile_time.cc   |   99 +
 src/relax/analysis/struct_info_analysis.cc         |   32 +-
 src/relax/analysis/well_formed.cc                  |    5 +-
 src/relax/distributed/axis_group_graph.cc          |  195 +-
 src/relax/distributed/transform/lower_distir.cc    |  271 ++
 .../transform/lower_global_view_to_local_view.cc   |  442 +++
 .../distributed/transform/propagate_sharding.cc    |  285 +-
 src/relax/distributed/transform/utils.cc           |   81 +
 src/relax/distributed/transform/utils.h            |   67 +
 src/relax/ir/block_builder.cc                      |   49 +-
 src/relax/ir/dataflow_matcher.cc                   |    7 +-
 src/relax/ir/expr.cc                               |    6 +
 src/relax/ir/struct_info.cc                        |    5 +-
 src/relax/op/ccl/ccl.cc                            |   11 +-
 src/relax/op/distributed/binary.h                  |    5 +-
 src/relax/op/distributed/{op.cc => ccl.cc}         |   23 +-
 src/relax/op/distributed/distributed.cc            |   60 +-
 src/relax/op/distributed/linear_algebra.cc         |    5 +-
 src/relax/op/distributed/manipulate.cc             |   10 +-
 src/relax/op/distributed/nn.cc                     |    5 +-
 src/relax/op/distributed/op.cc                     |   10 +
 src/relax/op/distributed/statistical.cc            |    5 +-
 src/relax/op/distributed/utils.cc                  |   47 +-
 src/relax/op/distributed/utils.h                   |   13 +-
 src/relax/op/image/resize.cc                       |   10 +-
 src/relax/op/nn/attention.cc                       |    5 +-
 src/relax/op/nn/convolution.cc                     |   40 +-
 src/relax/op/nn/nn.cc                              |   50 +-
 src/relax/op/nn/pooling.cc                         |   20 +-
 src/relax/op/tensor/binary.cc                      |   20 +-
 src/relax/op/tensor/create.cc                      |    5 +-
 src/relax/op/tensor/index.cc                       |   49 +-
 src/relax/op/tensor/manipulate.cc                  |  290 +-
 src/relax/op/tensor/search.cc                      |   22 +-
 src/relax/op/tensor/set.cc                         |   39 +-
 src/relax/op/{distributed/op.cc => tensor/sort.cc} |   44 +-
 src/relax/op/{distributed/op.cc => tensor/sort.h}  |   43 +-
 src/relax/op/tensor/statistical.cc                 |   45 +-
 src/relax/transform/alter_op_impl.cc               |    1 +
 src/relax/transform/call_tir_rewrite.cc            |   39 +-
 src/relax/transform/canonicalize_bindings.cc       |  106 +-
 src/relax/transform/convert_dataflow.cc            |  151 +
 src/relax/transform/expand_tuple_arguments.cc      |  187 ++
 src/relax/transform/fuse_ops.cc                    |   37 +-
 src/relax/transform/fuse_tir.cc                    |  355 ++-
 src/relax/transform/inline_functions.cc            |  228 ++
 src/relax/transform/legalize_ops.cc                |   42 +
 src/relax/transform/meta_schedule.cc               |   10 +-
 src/relax/transform/normalize.cc                   |  106 +
 src/relax/transform/remove_unused_outputs.cc       |  326 +++
 src/relax/transform/remove_unused_parameters.cc    |  260 ++
 src/relax/transform/static_plan_block_memory.cc    |    8 +
 src/relax/transform/update_param_struct_info.cc    |  111 +
 src/relax/transform/utils.h                        |   11 +
 src/relax/utils.cc                                 |   18 +
 src/relay/backend/contrib/clml/codegen.cc          |    2 +-
 src/relay/op/nn/convolution.cc                     |   99 +-
 src/runtime/contrib/clml/clml_runtime.cc           |  521 ++--
 src/runtime/contrib/cublas/cublas_json_runtime.cc  |    1 -
 src/runtime/contrib/cudnn/cudnn_json_runtime.cc    |    3 +-
 src/runtime/contrib/cutlass/weight_preprocess.cc   |   15 +-
 src/runtime/contrib/miopen/conv_forward.cc         |   21 +
 src/runtime/contrib/msc/tensorrt_runtime.cc        |   71 +-
 src/runtime/contrib/vllm/attention_kernels.cu      |  774 +++++
 src/runtime/contrib/vllm/attention_utils.cuh       |   55 +
 src/runtime/contrib/vllm/cache_alloc.cc            |   55 +
 src/runtime/contrib/vllm/cache_kernels.cu          |  234 ++
 src/runtime/contrib/vllm/dtype_float16.h           |  697 +++++
 src/runtime/cuda/cuda_device_api.cc                |    8 +
 src/runtime/disco/bcast_session.h                  |    3 +-
 src/runtime/disco/builtin.cc                       |    9 +-
 src/runtime/disco/{worker.cc => disco_worker.cc}   |   10 +-
 src/runtime/disco/disco_worker_thread.h            |   83 +
 src/runtime/disco/loader.cc                        |   89 +-
 src/runtime/disco/nccl/nccl.cc                     |    5 +-
 src/runtime/disco/process_session.cc               |    9 +-
 src/runtime/disco/session.cc                       |    3 +-
 src/runtime/disco/threaded_session.cc              |    3 +-
 src/runtime/disco/utils.h                          |   32 +-
 src/runtime/metal/metal_device_api.mm              |    8 +-
 src/runtime/relax_vm/builtin.cc                    |   29 +
 src/runtime/relax_vm/executable.cc                 |   45 +-
 src/runtime/relax_vm/kv_cache.h                    |  183 ++
 src/runtime/relax_vm/lm_support.cc                 |   59 +-
 src/runtime/relax_vm/ndarray_cache_support.cc      |  170 +-
 src/runtime/relax_vm/paged_kv_cache.cc             | 1385 +++++----
 src/runtime/relax_vm/vm.cc                         |  302 +-
 src/script/printer/relax/call.cc                   |    7 +-
 src/script/printer/relax/utils.h                   |    1 -
 src/script/printer/tir/stmt.cc                     |   22 +-
 src/support/errno_handling.h                       |   69 +
 src/support/pipe.h                                 |   42 +-
 src/support/socket.h                               |   65 +-
 src/target/source/codegen_c.cc                     |   43 +-
 src/target/tag.cc                                  |   24 +
 src/te/operation/create_primfunc.cc                |   19 +-
 src/tir/analysis/verify_well_formed.cc             |  214 ++
 src/tir/ir/data_type_rewriter.cc                   |    7 +-
 src/tir/ir/specialize.cc                           |  106 +-
 src/tir/ir/tir_visitor_with_path.cc                |  434 +++
 src/tir/ir/tir_visitor_with_path.h                 |  210 ++
 src/tir/op/builtin.cc                              |    4 +
 src/tir/schedule/transform.cc                      |   41 +-
 src/tir/schedule/transform.h                       |    9 +
 src/tir/transforms/compact_buffer_region.cc        |    2 +-
 src/tir/transforms/default_gpu_schedule.cc         |   49 +-
 src/tir/transforms/inline_private_functions.cc     |  300 ++
 src/tir/transforms/ir_utils.cc                     |   54 +-
 src/tir/transforms/loop_partition.cc               |   35 +
 ...tions.cc => merge_shared_memory_allocations.cc} |  115 +-
 src/tir/transforms/split_host_device.cc            |    2 +-
 src/tir/transforms/storage_access.cc               |   30 +-
 src/tir/transforms/storage_rewrite.cc              |   19 +-
 src/tir/transforms/unify_thread_binding.cc         |    3 +-
 src/topi/transform.cc                              |    3 +-
 tests/lint/check_file_type.py                      |    1 +
 tests/python/codegen/test_target_codegen_cuda.py   |   46 +
 tests/python/contrib/test_ccache.py                |   79 +
 tests/python/contrib/test_clml/conftest.py         |   21 +-
 tests/python/contrib/test_clml/infrastructure.py   |  242 +-
 tests/python/contrib/test_clml/test_network.py     |  249 +-
 tests/python/contrib/test_clml/test_ops.py         |  942 ++++--
 .../test_hexagon/test_pass_fq2i_avg_pool2d.py      |  115 +-
 tests/python/contrib/test_msc/test_graph_build.py  |   25 +-
 tests/python/contrib/test_msc/test_manager.py      |  276 ++
 tests/python/contrib/test_msc/test_runner.py       |   37 +-
 tests/python/contrib/test_msc/test_tools.py        |  302 ++
 tests/python/contrib/test_msc/test_transform.py    |  156 +
 .../test_msc/test_transform_set_expr_layout.py     |   73 -
 .../test_msc/test_transform_set_expr_name.py       |  108 -
 .../contrib/test_msc/test_translate_relax.py       |    2 +-
 .../contrib/test_msc/test_translate_tensorrt.py    |   36 +
 tests/python/dlight/test_gpu_gemv.py               |  158 +-
 tests/python/dlight/test_gpu_general_reduction.py  |  256 +-
 tests/python/dlight/test_gpu_reduction.py          |  427 ++-
 tests/python/dlight/test_primitives.py             |   60 +
 tests/python/frontend/keras/test_forward.py        |   24 +
 tests/python/frontend/pytorch/test_forward.py      |  211 +-
 tests/python/frontend/pytorch/test_span_naming.py  |  106 +
 tests/python/frontend/tflite/test_forward.py       |   24 +-
 tests/python/integration/test_arm_aprofile.py      |    1 +
 .../test_distributed_transform_lower_distir.py     |  396 +++
 ...ributed_transform_lower_global_to_local_view.py | 1579 +++++++++++
 ...est_distributed_transform_propagate_sharding.py | 2988 +++++++++++++-------
 tests/python/relax/frontend_nn_extern_module.cc    |   69 +
 .../test_analysis_computable_at_compile_time.py    |  243 ++
 .../relax/test_backend_dispatch_sort_scan.py       |  253 ++
 tests/python/relax/test_bind_params.py             |    5 +-
 tests/python/relax/test_blockbuilder_core.py       |  295 +-
 tests/python/relax/test_codegen_cutlass.py         |  218 +-
 tests/python/relax/test_contrib_vllm.py            |  746 +++++
 tests/python/relax/test_dataflow_pattern.py        |    9 +-
 tests/python/relax/test_expr.py                    |   20 +
 tests/python/relax/test_frontend_nn_debug.py       |   83 +
 .../python/relax/test_frontend_nn_extern_module.py |  323 ++-
 tests/python/relax/test_frontend_nn_modules.py     |   55 +-
 tests/python/relax/test_frontend_nn_op.py          |  175 +-
 tests/python/relax/test_frontend_nn_packing.py     |   51 +-
 tests/python/relax/test_frontend_onnx.py           |    2 +
 tests/python/relax/test_frontend_stablehlo.py      |    4 +-
 tests/python/relax/test_inline_functions.py        |  404 +++
 tests/python/relax/test_op_misc.py                 |   14 +-
 tests/python/relax/test_op_sort.py                 |  102 +
 tests/python/relax/test_runtime_builtin.py         |   41 +
 ...est_runtime_builtin_paged_attention_kv_cache.py |  766 ++---
 tests/python/relax/test_transform_alter_op_impl.py |  100 +
 .../relax/test_transform_canonicalize_bindings.py  |  311 +-
 .../relax/test_transform_convert_dataflow.py       |  493 ++++
 .../relax/test_transform_expand_tuple_args.py      |   79 +
 .../relax/test_transform_fuse_ops_by_pattern.py    |   23 +
 tests/python/relax/test_transform_fuse_tir.py      |  171 +-
 .../test_transform_inline_private_functions.py     |  105 +
 ..._transform_legalize_ops_index_linear_algebra.py |   90 +-
 .../python/relax/test_transform_legalize_ops_nn.py |   75 +-
 .../relax/test_transform_normalize_global_var.py   |   98 +
 .../relax/test_transform_remove_unused_outputs.py  |  123 +
 .../test_transform_remove_unused_parameters.py     |  101 +
 .../test_transform_static_plan_block_memory.py     |   73 +-
 .../test_transform_update_param_struct_info.py     |   71 +
 tests/python/relax/test_tvmscript_parser.py        |   94 +-
 .../python/relax/test_tvmscript_parser_op_sort.py  |   54 +
 tests/python/relax/test_vm_multi_device.py         |  186 ++
 .../relay/strategy/test_select_implementation.py   |  128 +-
 tests/python/relay/test_json_compact.py            |   20 +-
 tests/python/relay/test_op_level2.py               |   24 +
 tests/python/relay/test_py_converter.py            |   16 +-
 tests/python/relay/test_vm.py                      |   34 +-
 .../test_tir_analysis_verify_well_formed.py        |  148 +-
 tests/python/tir-base/test_tir_specialize.py       |  138 +-
 .../test_tir_inline_private_functions.py           |  253 ++
 .../test_tir_transform_convert_ssa.py              |  216 ++
 .../test_tir_transform_inject_ptx_async_copy.py    |    2 +-
 .../test_tir_transform_loop_partition.py           |  228 ++
 ...form_merge_dynamic_shared_memory_allocations.py |   65 +-
 ...sform_merge_static_shared_memory_allocations.py |  203 ++
 .../tir-transform/test_tir_transform_simplify.py   |   12 +
 .../test_tir_transform_split_host_device.py        |   72 +
 .../test_transform_default_gpu_schedule.py         |   73 +
 tests/python/topi/test_topi_conv2d_nhwc.py         |   39 +
 .../topi/test_topi_group_conv1d_transpose_ncw.py   |  110 +
 .../topi/test_topi_group_conv3d_transpose_ncdhw.py |  109 +
 tests/python/topi/test_topi_transform.py           |   21 +-
 .../tvmscript/test_tvmscript_printer_annotation.py |   25 +
 tests/scripts/ci.py                                |    2 +
 tests/scripts/setup-pytest-env.sh                  |    4 +-
 tests/scripts/task_config_build_gpu.sh             |    1 +
 tests/scripts/task_python_adreno.sh                |   22 +-
 tests/scripts/task_python_integration.sh           |    2 +-
 version.py                                         |    2 +-
 web/Makefile                                       |    4 +-
 web/emcc/tvmjs_support.cc                          |   11 +-
 web/emcc/wasm_runtime.cc                           |    5 +-
 web/package-lock.json                              |   12 +-
 web/package.json                                   |    6 +-
 web/src/ctypes.ts                                  |    5 +
 web/src/runtime.ts                                 |   50 +-
 web/src/webgpu.ts                                  |    7 +-
 web/tests/node/test_packed_func.js                 |   15 +
 502 files changed, 42325 insertions(+), 8141 deletions(-)
 create mode 160000 3rdparty/flashinfer
 copy tests/python/contrib/test_clml/conftest.py => 
cmake/modules/contrib/vllm.cmake (71%)
 copy src/relax/op/distributed/op.cc => include/tvm/relax/attrs/sort.h (52%)
 rename {src => include/tvm}/runtime/disco/builtin.h (78%)
 rename src/runtime/disco/worker.h => include/tvm/runtime/disco/disco_worker.h 
(60%)
 rename {src => include/tvm}/runtime/relax_vm/ndarray_cache_support.h (77%)
 create mode 100644 licenses/LICENSE.vllm.txt
 copy python/tvm/{relax/backend => contrib/msc/core/gym}/__init__.py (87%)
 copy python/tvm/{relax/backend => contrib/msc/core/gym/agent}/__init__.py (87%)
 create mode 100644 python/tvm/contrib/msc/core/gym/agent/base_agent.py
 create mode 100644 python/tvm/contrib/msc/core/gym/agent/method.py
 create mode 100644 python/tvm/contrib/msc/core/gym/agent/search_agent.py
 copy python/tvm/{relax/backend => contrib/msc/core/gym/control}/__init__.py 
(87%)
 create mode 100644 python/tvm/contrib/msc/core/gym/control/configer.py
 create mode 100644 python/tvm/contrib/msc/core/gym/control/controller.py
 copy tests/python/contrib/test_clml/conftest.py => 
python/tvm/contrib/msc/core/gym/control/namespace.py (63%)
 create mode 100644 python/tvm/contrib/msc/core/gym/control/service.py
 create mode 100644 python/tvm/contrib/msc/core/gym/control/worker.py
 copy python/tvm/{relax/backend => 
contrib/msc/core/gym/environment}/__init__.py (86%)
 create mode 100644 python/tvm/contrib/msc/core/gym/environment/base_env.py
 create mode 100644 python/tvm/contrib/msc/core/gym/environment/method.py
 create mode 100644 python/tvm/contrib/msc/core/gym/environment/prune_env.py
 create mode 100644 python/tvm/contrib/msc/core/gym/environment/quantize_env.py
 copy python/tvm/{relax/backend => contrib/msc/core/tools}/__init__.py (82%)
 copy python/tvm/{relax/backend => contrib/msc/core/tools/distill}/__init__.py 
(87%)
 create mode 100644 python/tvm/contrib/msc/core/tools/distill/distiller.py
 create mode 100644 python/tvm/contrib/msc/core/tools/distill/method.py
 create mode 100644 python/tvm/contrib/msc/core/tools/execute.py
 copy python/tvm/{relax/backend => contrib/msc/core/tools/prune}/__init__.py 
(87%)
 create mode 100644 python/tvm/contrib/msc/core/tools/prune/method.py
 create mode 100644 python/tvm/contrib/msc/core/tools/prune/pruner.py
 copy python/tvm/{relax/backend => contrib/msc/core/tools/quantize}/__init__.py 
(87%)
 create mode 100644 python/tvm/contrib/msc/core/tools/quantize/method.py
 create mode 100644 python/tvm/contrib/msc/core/tools/quantize/quantizer.py
 create mode 100644 python/tvm/contrib/msc/core/tools/tool.py
 copy python/tvm/{relax/backend => contrib/msc/core/tools/track}/__init__.py 
(87%)
 create mode 100644 python/tvm/contrib/msc/core/tools/track/method.py
 create mode 100644 python/tvm/contrib/msc/core/tools/track/tracker.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tensorflow/tools}/__init__.py (85%)
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tensorflow/tools/distill}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/tensorflow/tools/distill/distiller.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tensorflow/tools/prune}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/tensorflow/tools/prune/pruner.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tensorflow/tools/quantize}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/tensorflow/tools/quantize/quantizer.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tensorflow/tools/track}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/tensorflow/tools/track/tracker.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tensorrt/tools}/__init__.py (85%)
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tensorrt/tools/distill}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/tensorrt/tools/distill/distiller.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tensorrt/tools/prune}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/tensorrt/tools/prune/pruner.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tensorrt/tools/quantize}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/tensorrt/tools/quantize/method.py
 create mode 100644 
python/tvm/contrib/msc/framework/tensorrt/tools/quantize/quantizer.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tensorrt/tools/track}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/tensorrt/tools/track/tracker.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/torch/tools}/__init__.py (85%)
 copy python/tvm/{relax/backend => 
contrib/msc/framework/torch/tools/distill}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/torch/tools/distill/distiller.py
 create mode 100644 
python/tvm/contrib/msc/framework/torch/tools/distill/method.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/torch/tools/prune}/__init__.py (87%)
 create mode 100644 python/tvm/contrib/msc/framework/torch/tools/prune/pruner.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/torch/tools/quantize}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/torch/tools/quantize/method.py
 create mode 100644 
python/tvm/contrib/msc/framework/torch/tools/quantize/quantizer.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/torch/tools/track}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/torch/tools/track/tracker.py
 copy python/tvm/{relax/backend => contrib/msc/framework/tvm/tools}/__init__.py 
(85%)
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tvm/tools/distill}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/tvm/tools/distill/distiller.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tvm/tools/prune}/__init__.py (87%)
 create mode 100644 python/tvm/contrib/msc/framework/tvm/tools/prune/pruner.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tvm/tools/quantize}/__init__.py (87%)
 create mode 100644 
python/tvm/contrib/msc/framework/tvm/tools/quantize/method.py
 create mode 100644 
python/tvm/contrib/msc/framework/tvm/tools/quantize/quantizer.py
 copy python/tvm/{relax/backend => 
contrib/msc/framework/tvm/tools/track}/__init__.py (87%)
 create mode 100644 python/tvm/contrib/msc/framework/tvm/tools/track/tracker.py
 copy python/tvm/{relax/backend => contrib/msc/pipeline}/__init__.py (87%)
 create mode 100644 python/tvm/contrib/msc/pipeline/manager.py
 copy python/tvm/{relax/distributed/transform/transform.py => 
dlight/gpu/base.py} (52%)
 create mode 100644 python/tvm/relax/backend/dispatch_sort_scan.py
 create mode 100644 python/tvm/relax/backend/utils.py
 create mode 100644 python/tvm/relax/frontend/nn/exporter.py
 create mode 100644 python/tvm/relax/frontend/nn/extern.py
 copy python/tvm/relax/{distributed/transform/transform.py => op/sort.py} (57%)
 create mode 100644 python/tvm/relax/transform/attach_external_modules.py
 create mode 100644 src/relax/analysis/computable_at_compile_time.cc
 create mode 100644 src/relax/distributed/transform/lower_distir.cc
 create mode 100644 
src/relax/distributed/transform/lower_global_view_to_local_view.cc
 create mode 100644 src/relax/distributed/transform/utils.cc
 create mode 100644 src/relax/distributed/transform/utils.h
 copy src/relax/op/distributed/{op.cc => ccl.cc} (59%)
 copy src/relax/op/{distributed/op.cc => tensor/sort.cc} (51%)
 copy src/relax/op/{distributed/op.cc => tensor/sort.h} (57%)
 create mode 100644 src/relax/transform/convert_dataflow.cc
 create mode 100644 src/relax/transform/expand_tuple_arguments.cc
 create mode 100644 src/relax/transform/inline_functions.cc
 create mode 100644 src/relax/transform/remove_unused_outputs.cc
 create mode 100644 src/relax/transform/remove_unused_parameters.cc
 create mode 100644 src/relax/transform/update_param_struct_info.cc
 create mode 100644 src/runtime/contrib/vllm/attention_kernels.cu
 create mode 100644 src/runtime/contrib/vllm/attention_utils.cuh
 create mode 100644 src/runtime/contrib/vllm/cache_alloc.cc
 create mode 100644 src/runtime/contrib/vllm/cache_kernels.cu
 create mode 100644 src/runtime/contrib/vllm/dtype_float16.h
 rename src/runtime/disco/{worker.cc => disco_worker.cc} (97%)
 create mode 100644 src/runtime/disco/disco_worker_thread.h
 create mode 100644 src/runtime/relax_vm/kv_cache.h
 create mode 100644 src/support/errno_handling.h
 create mode 100644 src/tir/ir/tir_visitor_with_path.cc
 create mode 100644 src/tir/ir/tir_visitor_with_path.h
 create mode 100644 src/tir/transforms/inline_private_functions.cc
 rename src/tir/transforms/{merge_dynamic_shared_memory_allocations.cc => 
merge_shared_memory_allocations.cc} (82%)
 create mode 100644 tests/python/contrib/test_ccache.py
 create mode 100644 tests/python/contrib/test_msc/test_manager.py
 create mode 100644 tests/python/contrib/test_msc/test_tools.py
 create mode 100644 tests/python/contrib/test_msc/test_transform.py
 delete mode 100644 
tests/python/contrib/test_msc/test_transform_set_expr_layout.py
 delete mode 100644 
tests/python/contrib/test_msc/test_transform_set_expr_name.py
 create mode 100644 tests/python/dlight/test_primitives.py
 create mode 100644 tests/python/frontend/pytorch/test_span_naming.py
 create mode 100644 
tests/python/relax/distributed/test_distributed_transform_lower_distir.py
 create mode 100644 
tests/python/relax/distributed/test_distributed_transform_lower_global_to_local_view.py
 create mode 100644 tests/python/relax/frontend_nn_extern_module.cc
 create mode 100644 
tests/python/relax/test_analysis_computable_at_compile_time.py
 create mode 100644 tests/python/relax/test_backend_dispatch_sort_scan.py
 create mode 100644 tests/python/relax/test_contrib_vllm.py
 create mode 100644 tests/python/relax/test_frontend_nn_debug.py
 create mode 100644 tests/python/relax/test_inline_functions.py
 create mode 100644 tests/python/relax/test_op_sort.py
 create mode 100644 tests/python/relax/test_transform_convert_dataflow.py
 create mode 100644 tests/python/relax/test_transform_expand_tuple_args.py
 create mode 100644 
tests/python/relax/test_transform_inline_private_functions.py
 create mode 100644 tests/python/relax/test_transform_normalize_global_var.py
 create mode 100644 tests/python/relax/test_transform_remove_unused_outputs.py
 create mode 100644 
tests/python/relax/test_transform_remove_unused_parameters.py
 create mode 100644 
tests/python/relax/test_transform_update_param_struct_info.py
 create mode 100644 tests/python/relax/test_tvmscript_parser_op_sort.py
 create mode 100644 tests/python/relax/test_vm_multi_device.py
 create mode 100644 
tests/python/tir-transform/test_tir_inline_private_functions.py
 create mode 100644 
tests/python/tir-transform/test_tir_transform_merge_static_shared_memory_allocations.py
 create mode 100644 tests/python/topi/test_topi_group_conv1d_transpose_ncw.py
 create mode 100644 tests/python/topi/test_topi_group_conv3d_transpose_ncdhw.py

Reply via email to