This is an automated email from the ASF dual-hosted git repository.
ruihangl pushed a change to branch unity-staging
in repository https://gitbox.apache.org/repos/asf/tvm.git
discard bf39afa04b Merge branch 'main' into unity
add ef2a9139c1 [Unity] Improved error message for matmul shape mismatch
(#16308)
add b22aa0f7c7 [Unity] Improved error message in
ExprMutator::ReEmitBinding (#16307)
add fab6db20e7 [Unity][Transform] Use parameter name in BundleModelParams
(#16309)
add 1e95b63fcc Merge branch 'main' of github.com:apache/tvm into unity
add 2d53e6ac63 [Unity][Transform] Handle replacement at both var binding
and usage (#16367)
add a796023342 [Unity][Fix] Memory planning check value type of
'tir_var_upper_bound' (#16362)
add 4a37cfefa0 [Unity][Analysis] Show objects instead of names in
WellFormedChecker (#16310)
add 4e05eb4e1c [CI] Upgrade Unity ci images (#16369)
add 298ad2c3f6 [Unity][Transform] Update LambdaLift to use name of lifted
lambda (#16306)
add e1d71b3720 [Unity] Add dlight.gpu.Fallback in DispatchSortScan, add
argsort, topk, and cumprod (#16351)
add 9d483d462e [LLVM] Update Host.h path (#16373)
add 3166366af6 [CI] Upgrade sccache version to 0.7.* (#16366)
add 524ec5f1b0 [Runtime] Use cudaGetDeviceCount to check if device exists
(#16377)
add 45532d791c Merge branch 'main' of github.com:apache/tvm into unity
add 474c06b8b3 [Unity] Set CMAKE_CUDA_ARCHITECTURES default to native
(#16335)
add 0b2358c2e4 [Relay] make "ToScalar" support directly obtaining
"int64_t" (#16324)
add e53a8bcfb9 [TOPI][Target] Add fp16 SIMD support for conv2d on
`arm_cpu` targets (#16383)
add 8e67e2a3a1 [TVMC] Add tvmc flag to print ir before and print ir after
named pass (#16261)
add e2e33ddd54 [Bugfix] Disable SingleEnvThreadVerifier (#16361)
add 5d4c01e0fc [Thrust] Use no sync exec policy and caching allocator
(#16386)
add c40d96b59e Merge remote-tracking branch 'upstream/main' into unity
add b8230f6e47 [Unity] Update dispatch test cases following the merge from
main (#16388)
add 81a6c51ba4 [Unity] Fix creation of disco ProcessSession (#16375)
add d1b890a4e3 [Unity][Contrib] Fix a bug due to typo in vllm
`reconstruct_from_cache` kernel and add test (#16376)
add b69d720593 [Unity][MSC] Avoid depending on trivial bindings in Relax
intermediate (#16349)
add 4c7c010513 [Unity][Transform] Implement
relax.transform.AdjustMatmulOrder (#16314)
add 7798e93529 [Unity] Support TIR kernel for PagedKVCache (#16374)
add 138cb651e0 [Unity][BlockBuilder] Restore bb.get() (#16378)
add 07d8e02367 [Unity][nnModule] Dynamic shape support in nn Module
(#16284)
add 5c87bfe09b [Unity][Relax][Op] Add Conv3D Operator (#16385)
add e9bea9d2e2 [Relax][Frontend][ONNX]fix onnx frontend parse (#16395)
add 98d5153918 [Unity] PagedKVCache supporting on-the-fly RoPE calculation
(#16396)
add cf14eddebb [Unity][Transform] Memory planning for dynamic-shape func
return (#16111)
add a2a1b53402 [Unity] Split DecomposeOpsForTraining into two steps
(#15954)
add cbe9c14879 [Bugfix][Unity] Recover MSVC/NVCC/ROCm/Vulkan (#16414)
add a763b22119 [Unity][Transform] Replace eligible operators with in-place
versions in dataflow blocks (#16129)
add c470e1a922 [Unity][Fix] Fix mismatched intrinsic name (#16418)
add 7336379dea [Unity][Frontend][NN] Better support for dynamic
convolutions (#16427)
add ae8d398b88 [CI] In jenkins.cmd_utils.Sh.tee, check for failing
subprocess (#16382)
add f1bf20a950 [RPC] Fix tuning on macOS and Windows (#15771) (#16357)
add 4258c864b9 [RUNTIME][RPC] Enable RPCObjectRef return in RPC (#16387)
add 196b413813 [Relay][Frontend][Torch] fix a typo mistake in
nonzero_numpy (#16390)
add 3e52c3dba5 [CI] Remove NVIDIA_DISABLE_REQUIRE (#16384)
add fe9814c73e [OpenCL][CMake] Fix OpenCL tests compilation (#16394)
add a7dd32cc16 [DeviceAPI] Support querying total global memory (#16398)
add 68be158d35 [ROCm] Some fixes of ROCm codegen (#16404)
add 3053f65da7 Add NVIDIA Hopper H100 target tag (#16407)
add 12ad4fbcf4 [Relay][Frontend][Torch] fix pytorch frontend not support
logical or (#16400)
add 7ef521fad6 [COMMUNITY] Add new key for release signing (#16419)
add a5e883e846 [RUNTIME][CLML] Fix for Softmax op for 4D tensors (#16328)
add e1c430c7e3 [Relay][Frontend][Torch] fix pytorch frontend linspace op
(#16417)
add 827beed0d6 [CMake] Enable cuda lang if USE_CUDA is on (#16426)
add 614a7a9e31 [CI][WASM] Update emsdk and nodejs version (#16420)
new 5a2949bd70 Merge branch 'main' of into branch 'unity'
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (bf39afa04b)
\
N -- N -- N refs/heads/unity-staging (5a2949bd70)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
3rdparty/cutlass_fpA_intB_gemm | 2 +-
3rdparty/flashinfer | 2 +-
CMakeLists.txt | 2 +-
KEYS | 59 +
ci/jenkins/generated/arm_jenkinsfile.groovy | 4 +-
ci/jenkins/generated/cortexm_jenkinsfile.groovy | 4 +-
ci/jenkins/generated/cpu_jenkinsfile.groovy | 4 +-
ci/jenkins/generated/docker_jenkinsfile.groovy | 4 +-
ci/jenkins/generated/gpu_jenkinsfile.groovy | 4 +-
ci/jenkins/generated/hexagon_jenkinsfile.groovy | 4 +-
ci/jenkins/generated/i386_jenkinsfile.groovy | 4 +-
ci/jenkins/generated/lint_jenkinsfile.groovy | 4 +-
.../generated/minimal_cross_isa_jenkinsfile.groovy | 4 +-
ci/jenkins/generated/minimal_jenkinsfile.groovy | 4 +-
ci/jenkins/generated/riscv_jenkinsfile.groovy | 4 +-
ci/jenkins/generated/wasm_jenkinsfile.groovy | 4 +-
ci/jenkins/templates/utils/base.groovy.j2 | 2 +-
ci/jenkins/unity_jenkinsfile.groovy | 8 +-
ci/scripts/jenkins/cmd_utils.py | 44 +-
cmake/modules/CUDA.cmake | 30 +-
cmake/modules/OpenCL.cmake | 2 +-
docker/install/ubuntu_install_boost.sh | 2 +-
docker/install/ubuntu_install_emscripten.sh | 4 +-
docker/install/ubuntu_install_nodejs.sh | 2 +-
docker/install/ubuntu_install_sccache.sh | 2 +-
include/tvm/ir/transform.h | 25 +
include/tvm/relax/attrs/nn.h | 45 +
include/tvm/relax/attrs/sort.h | 52 -
.../attrs/algorithm.h => relax/attrs/sorting.h} | 92 +-
include/tvm/relax/attrs/statistical.h | 20 +-
include/tvm/relax/block_builder.h | 6 +-
include/tvm/relax/dataflow_matcher.h | 28 +
include/tvm/relax/dataflow_pattern.h | 9 +-
include/tvm/relax/transform.h | 10 +
include/tvm/runtime/device_api.h | 1 +
include/tvm/runtime/logging.h | 18 +-
include/tvm/runtime/object.h | 4 +-
python/tvm/_ffi/runtime_ctypes.py | 14 +
python/tvm/contrib/msc/core/codegen/codegen.py | 16 +-
.../contrib/msc/framework/tvm/codegen/codegen.py | 16 +-
python/tvm/driver/tvmc/compiler.py | 31 +-
python/tvm/ir/instrument.py | 18 +
python/tvm/relax/__init__.py | 8 +-
python/tvm/relax/backend/contrib/cutlass.py | 5 +
python/tvm/relax/backend/dispatch_sort_scan.py | 125 +-
python/tvm/relax/block_builder.py | 13 +-
python/tvm/relax/frontend/nn/core.py | 15 +-
python/tvm/relax/frontend/nn/exporter.py | 21 +-
python/tvm/relax/frontend/nn/modules.py | 19 +-
python/tvm/relax/frontend/onnx/onnx_frontend.py | 50 +-
python/tvm/relax/frontend/torch/fx_translator.py | 29 +
python/tvm/relax/op/__init__.py | 4 +-
python/tvm/relax/op/nn/__init__.py | 1 +
python/tvm/relax/op/nn/nn.py | 100 ++
python/tvm/relax/op/op_attrs.py | 16 +-
python/tvm/relax/op/sort.py | 45 -
python/tvm/relax/op/sorting.py | 116 ++
python/tvm/relax/op/statistical.py | 72 +-
python/tvm/relax/testing/transform.py | 98 +-
python/tvm/relax/transform/__init__.py | 2 +
python/tvm/relax/transform/legalize_ops/nn.py | 41 +
.../relax/transform/legalize_ops/statistical.py | 11 +-
python/tvm/relax/transform/transform.py | 33 +
python/tvm/relay/frontend/pytorch.py | 11 +-
python/tvm/relay/op/contrib/clml.py | 3 +-
python/tvm/rpc/server.py | 11 +-
python/tvm/runtime/disco/session.py | 2 +-
python/tvm/script/ir_builder/relax/ir.py | 6 +
python/tvm/topi/arm_cpu/arm_utils.py | 7 +-
python/tvm/topi/arm_cpu/conv2d_gemm.py | 16 +-
src/contrib/msc/framework/tvm/codegen.cc | 11 +-
src/contrib/msc/framework/tvm/relax_opcode.cc | 3 -
src/ir/transform.cc | 31 +
src/relax/analysis/well_formed.cc | 27 +-
src/relax/ir/dataflow_matcher.cc | 14 +-
src/relax/ir/dataflow_pattern.cc | 6 +-
src/relax/ir/expr_functor.cc | 12 +-
src/relax/op/nn/convolution.cc | 176 +++
src/relax/op/nn/convolution.h | 5 +
src/relax/op/op_common.h | 25 +
src/relax/op/tensor/linear_algebra.cc | 25 +-
src/relax/op/tensor/sort.cc | 56 -
src/relax/op/tensor/sorting.cc | 155 +++
src/relax/op/tensor/{sort.h => sorting.h} | 32 +-
src/relax/op/tensor/statistical.cc | 55 +-
src/relax/op/tensor/statistical.h | 21 +-
src/relax/transform/adjust_matmul_order.cc | 176 +++
src/relax/transform/bundle_model_params.cc | 7 +-
src/relax/transform/dataflow_inplace.cc | 1040 ++++++++++++++
src/relax/transform/decompose_ops.cc | 156 +--
src/relax/transform/lambda_lift.cc | 214 ++-
src/relax/transform/static_plan_block_memory.cc | 166 ++-
src/relax/transform/utils.h | 12 +
src/relay/transforms/pattern_utils.h | 43 +-
src/relay/transforms/simplify_expr.cc | 2 +-
src/runtime/contrib/clml/clml_runtime.cc | 62 +-
src/runtime/contrib/thrust/thrust.cu | 245 ++--
src/runtime/contrib/vllm/cache_kernels.cu | 2 +-
src/runtime/cuda/cuda_common.h | 3 +
src/runtime/cuda/cuda_device_api.cc | 15 +-
src/runtime/metal/metal_device_api.mm | 4 +
src/runtime/minrpc/minrpc_server.h | 15 +-
src/runtime/minrpc/rpc_reference.h | 8 +
src/runtime/opencl/opencl_device_api.cc | 10 +-
src/runtime/relax_vm/paged_kv_cache.cc | 233 ++--
src/runtime/rocm/rocm_device_api.cc | 11 +-
src/runtime/rpc/rpc_endpoint.cc | 51 +-
src/runtime/rpc/rpc_local_session.cc | 20 +-
src/runtime/rpc/rpc_module.cc | 7 +
src/runtime/rpc/rpc_session.h | 51 +-
src/runtime/vulkan/vulkan_device_api.cc | 4 +
src/script/printer/relax/call.cc | 19 +-
src/support/pipe.h | 14 +-
src/target/llvm/codegen_llvm.cc | 2 +
src/target/llvm/intrin_rule_rocm.cc | 87 +-
src/target/llvm/llvm_instance.cc | 4 +
src/target/llvm/llvm_module.cc | 4 +
src/target/parsers/aprofile.cc | 12 +-
src/target/spirv/codegen_spirv.cc | 2 +
src/target/spirv/intrin_rule_spirv.cc | 3 +
src/target/tag.cc | 2 +
src/tir/analysis/verify_well_formed.cc | 2 -
src/tir/transforms/lower_thread_allreduce.cc | 2 +-
.../transforms/merge_shared_memory_allocations.cc | 5 +
src/tir/transforms/storage_rewrite.cc | 7 +-
tests/cpp/target/parsers/aprofile_test.cc | 27 +
tests/python/codegen/test_target_codegen_rocm.py | 53 +
tests/python/contrib/test_clml/test_ops.py | 86 +-
.../contrib/test_msc/test_translate_relax.py | 262 ++--
.../contrib/test_msc/test_translate_relay.py | 2 +
tests/python/driver/tvmc/test_command_line.py | 41 +-
tests/python/frontend/pytorch/test_forward.py | 21 +
tests/python/integration/test_auto_tensorize.py | 4 +-
.../test_meta_schedule_schedule_rule_mlt_intrin.py | 6 +-
.../relax/test_backend_dispatch_sort_scan.py | 307 +++--
tests/python/relax/test_blockbuilder_core.py | 12 +-
tests/python/relax/test_codegen_cutlass.py | 33 +-
tests/python/relax/test_contrib_vllm.py | 34 +
tests/python/relax/test_dataflow_inplace.py | 644 +++++++++
tests/python/relax/test_frontend_from_fx.py | 79 ++
tests/python/relax/test_frontend_nn_modules.py | 37 +
tests/python/relax/test_frontend_onnx.py | 15 +-
tests/python/relax/test_op_nn_convolution.py | 187 +++
tests/python/relax/test_op_sort.py | 192 +++
tests/python/relax/test_op_statistical.py | 53 +-
...builtin_paged_attention_kv_cache_flashinfer.py} | 37 +-
...runtime_builtin_paged_attention_kv_cache_tir.py | 1456 ++++++++++++++++++++
.../relax/test_transform_adjust_matmul_order.py | 351 +++++
.../relax/test_transform_bundle_model_params.py | 91 ++
.../relax/test_transform_canonicalize_bindings.py | 186 ++-
tests/python/relax/test_transform_decompose_ops.py | 71 +-
tests/python/relax/test_transform_lambda_lift.py | 97 +-
.../test_transform_static_plan_block_memory.py | 78 +-
tests/python/relax/test_tvmscript_parser.py | 36 +
.../python/relax/test_tvmscript_parser_op_sort.py | 14 +-
.../relax/test_tvmscript_parser_op_statistical.py | 8 +-
tests/python/relax/test_tvmscript_printer_relax.py | 25 +
tests/python/runtime/test_runtime_rpc.py | 31 +
.../python/tir-schedule/test_tir_schedule_error.py | 26 +-
tests/scripts/task_config_build_gpu.sh | 1 -
160 files changed, 8154 insertions(+), 1319 deletions(-)
delete mode 100644 include/tvm/relax/attrs/sort.h
copy include/tvm/{relay/attrs/algorithm.h => relax/attrs/sorting.h} (50%)
delete mode 100644 python/tvm/relax/op/sort.py
create mode 100644 python/tvm/relax/op/sorting.py
delete mode 100644 src/relax/op/tensor/sort.cc
create mode 100644 src/relax/op/tensor/sorting.cc
rename src/relax/op/tensor/{sort.h => sorting.h} (55%)
create mode 100644 src/relax/transform/adjust_matmul_order.cc
create mode 100644 src/relax/transform/dataflow_inplace.cc
create mode 100644 tests/python/relax/test_dataflow_inplace.py
rename tests/python/relax/{test_runtime_builtin_paged_attention_kv_cache.py =>
test_runtime_builtin_paged_attention_kv_cache_flashinfer.py} (94%)
create mode 100644
tests/python/relax/test_runtime_builtin_paged_attention_kv_cache_tir.py
create mode 100644 tests/python/relax/test_transform_adjust_matmul_order.py