This is an automated email from the ASF dual-hosted git repository.
junrushao pushed a change to branch unity-staging
in repository https://gitbox.apache.org/repos/asf/tvm.git
from 7828489c6c Merge remote-tracking branch 'apache-upstream/main' into
unity
add 9d27a6aeb8 Fix super() visit function in PyExprVisitor and
PyExprMutator (#15189)
add c07f67ceda [RPC] Disable socket SO_REUSEADDR for Windows (#15188)
add f6c138347d [Unity] Legalization for LayoutTransform (#15184)
add e571dc9262 [Unity] Add memory scope and nd allocation support in
allocators (#15178)
add 741ca41814 [Unity][Dlight] general reduction rule for gemv-decode
(#15169)
add 918fc4ecf7 [Unity][Dlight] Matmul Rules (#15191)
add b4b27fafbd [Python] Enhance Wheel Packaging (#15167)
add e1013f10f4 [CMake] Support LLVM-16 static linking (#15164)
add 34637d7ee3 [TensorIR][Visitor] Visit buffer members in
`match_buffer`'s in block visitor functions (#15153)
add e178375e1b [Frontend][Relay][Keras] Fix concatenate convert function
in axis parsing (#15175)
add 977b4b2e05 Update tvm_runtime.h (#15185)
add 683dfb0c04 [RPC] Report RPC Session Timeout to Client Instead of
"kShutdown" (#15187)
add fb64be3f78 [ARITH] Allow Analyzer to MarkGlobalNonNegValue (#15193)
add 5931cf10eb [TIR][Transform] Add LiftThreadBinding Pass (#15207)
add f14c61f0d1 [AOT] Remove workaround to help resolve test flakiness
(#15181)
add 03ef29e8ad [TIR][Schedule] Derive Nonnegative Bounds from Shape Var
(#15210)
add b160cb1a10 Update version to 0.14.dev0 on main branch (#15208)
add a60b815fe5 [TIR] Support cross-threaad reduction lowering with
thread-broadcasting rewrite (#15192)
add 40300a3b92 [MERGE] Merge main into unity 2023-07-03
add 1631c3d4ce [MERGE] Fix testcase after merge
add 11db81effd [Unity] Fix dlight reduction rule (#15194)
add 05278ea77b [VM] Add repetition penalty functions to Relax VM (#15219)
add 780d6e6d12 [Unity] Allow specifying struct_info for relax constant
(#15220)
add 04f22a9257 [Dlight] Enhance Decode-GEMV Schedule (#15195)
add c45f72b4cc [Unity][TIR][Transform] Support no spatial axes cases for
DefaultGPUSchedule (#15232)
add 5a44262502 [Unity] Fix memory statistics issues in
estimate_memory_usage (#15224)
add a4fa5e7c59 [Unity][NestedMsg] Add NestedMsgTo helper function (#15223)
add 1637b1436f [Unity][Dlight] Avoid TransformBlockLayout in GEMV Rule
(#15248)
add 15a6b475bb [Unity][Dlight] Handle Epilogue Broadcasting (#15252)
add 5828f1e9ee [Unity] Add a Standalone VM Version Number (#15254)
add 5dc25afc87 [microNPU][ETHOSU] Add Vela's logic to select configuration
block (#15186)
add 1d4829e430 [Android] ndk static build (#15215)
add 516c56b46a [Web] Increase default EMCC compilation total memory size
(#15218)
add 8f9f605dd5 [ARITH] Enhance buffer shape bound deduction to include
offset (#15228)
add 23fb568521 [CMAKE] Add Vulkan header for Android (#15229)
add c928852d59 [#15157][Rust][Doc] Re-enable the Rust documentation build
(#15213)
add 0bb390b272 [UnitTest][NVPTX] Avoid cascading failures from CUDA
postproc (#15136)
add 88701dc82a [Miscs] Enhance script about make release notes (#15234)
add 2f7c097594 [TIR] Allow VerifyWellFormed to accept IRModule (#15247)
add 73a62f647f [TIR] Preserve AllocateNode::annotations (#15242)
add 81463d79c0 [TIR][Schedule] Scoped CacheRead/Write producing compact
region (#15236)
add 3a33771494 [TVMScript] Handle parsing of PrimFunc calls with non-void
return (#15239)
add d9d6a88a0a [QNN] Support Dequantize to "float16" and Quantize to
"uint16" (#15235)
add 7489ce20df [Relay] ExprMutator Return Origin Expr When All Fields
Isn't Changed (#15237)
add 916542ed77 [TVMScript] Ensure completed root block has no read/write
(#15249)
add 5a78da4f3c [TIR] Output DeclBuffer in LowerTVMBuiltin (#15243)
add 24ae0d5b05 [bugfix][frontend][keras] Fix go_backwards attribute of
LSTM in keras frontend (#15261)
add 0c1aad78f9 [Testing] Add tvm.testing.local_run (#15268)
add a60cd0fecf [TIR] Allow symbolic bounds in IndexMap analysis (#15264)
new 1b869650b0 Merge remote-tracking branch 'apache-upstream/main' into
apache-upstream-unity
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
.github/workflows/main.yml | 21 -
apps/android_rpc/app/src/main/jni/tvm_runtime.h | 1 +
cmake/utils/FindLLVM.cmake | 32 +-
cmake/utils/FindVulkan.cmake | 57 +--
conda/recipe/meta.yaml | 2 +-
docs/reference/api/links.rst | 1 +
include/tvm/arith/analyzer.h | 16 +
include/tvm/relax/expr.h | 14 +-
include/tvm/relax/nested_msg.h | 52 ++-
include/tvm/relay/qnn/attrs.h | 2 +
include/tvm/runtime/c_runtime_api.h | 2 +-
include/tvm/runtime/relax_vm/executable.h | 6 +
include/tvm/runtime/relax_vm/memory_manager.h | 7 +
include/tvm/tir/index_map.h | 12 +-
include/tvm/tir/transform.h | 6 +
include/tvm/tir/var.h | 6 +
include/tvm/topi/transform.h | 9 +-
pyproject.toml | 3 +
python/gen_requirements.py | 6 +-
python/setup.py | 39 +-
python/tvm/_ffi/libinfo.py | 2 +-
python/tvm/contrib/emcc.py | 1 +
python/tvm/contrib/ndk.py | 55 +++
python/tvm/dlight/__init__.py | 10 +-
python/tvm/dlight/base/__init__.py | 4 +-
python/tvm/dlight/base/analysis.py | 197 ++++++--
python/tvm/dlight/base/common_schedules.py | 37 +-
python/tvm/dlight/base/transform.py | 2 +-
python/tvm/dlight/gpu/__init__.py | 2 +
python/tvm/dlight/gpu/decode_gemv.py | 277 ++++++++++++
python/tvm/dlight/gpu/fallback.py | 29 +-
python/tvm/dlight/gpu/matmul.py | 366 +++++++++++++++
python/tvm/dlight/gpu/reduction.py | 45 +-
python/tvm/dlight/gpu/utils.py | 87 ++++
python/tvm/error.py | 5 +
python/tvm/exec/microtvm_debug_shell.py | 8 +-
python/tvm/relax/analysis/estimate_memory_usage.py | 28 +-
python/tvm/relax/expr.py | 8 +-
python/tvm/relax/op/vm/vm.py | 16 +-
.../tvm/relax/transform/legalize_ops/manipulate.py | 34 ++
.../tvm/relay/backend/contrib/ethosu/vela_api.py | 86 +++-
python/tvm/relay/expr_functor.py | 56 ++-
python/tvm/relay/frontend/keras.py | 13 +-
python/tvm/relay/qnn/op/qnn.py | 21 +-
python/tvm/rpc/server.py | 96 ++--
python/tvm/rpc/tracker.py | 8 +-
python/tvm/runtime/relax_vm.py | 1 +
python/tvm/te/schedule.py | 11 +-
python/tvm/testing/__init__.py | 2 +-
python/tvm/testing/aot.py | 10 +-
python/tvm/testing/{rpc_run.py => runner.py} | 85 +++-
python/tvm/tir/analysis/analysis.py | 8 +-
python/tvm/tir/function.py | 20 +-
python/tvm/tir/op.py | 9 +-
python/tvm/tir/schedule/schedule.py | 40 +-
python/tvm/tir/schedule/testing.py | 6 +-
python/tvm/tir/transform/transform.py | 11 +
src/arith/analyzer.cc | 51 +++
src/arith/const_int_bound.cc | 38 +-
src/arith/ir_mutator_with_analyzer.cc | 29 ++
src/arith/ir_mutator_with_analyzer.h | 14 +
src/arith/iter_affine_map.cc | 6 +
src/arith/product_normal_form.h | 18 +
src/driver/driver_api.cc | 1 +
src/meta_schedule/postproc/verify_gpu_code.cc | 1 +
src/relax/backend/vm/codegen_vm.cc | 2 +-
src/relax/backend/vm/vm_builtin_lower.cc | 14 +-
src/relax/ir/expr.cc | 21 +-
src/relax/ir/py_expr_functor.cc | 32 +-
src/relax/op/op.cc | 9 +-
src/relax/op/op_common.h | 3 +-
src/relay/backend/te_compiler_cache.cc | 4 +-
src/relay/op/tensor/transform.cc | 4 +-
src/relay/qnn/op/dequantize.cc | 28 +-
src/relay/qnn/op/quantize.cc | 5 +-
src/relay/qnn/utils.h | 3 +-
src/runtime/logging.cc | 3 +
src/runtime/relax_vm/builtin.cc | 11 +-
src/runtime/relax_vm/executable.cc | 4 +-
src/runtime/relax_vm/lm_support.cc | 48 ++
src/runtime/relax_vm/memory_manager.cc | 19 +
src/runtime/relax_vm/naive_allocator.h | 21 +
src/runtime/rpc/rpc_endpoint.cc | 8 +-
src/runtime/rpc/rpc_socket_impl.cc | 34 ++
src/script/ir_builder/ir/ir.cc | 4 +-
src/target/source/codegen_cuda.cc | 3 +-
src/te/schedule/message_passing.cc | 14 +-
src/te/schedule/schedule_lang.cc | 6 +-
src/tir/analysis/verify_well_formed.cc | 25 +-
src/tir/ir/expr.cc | 8 +-
src/tir/ir/index_map.cc | 64 +--
src/tir/ir/script/script_complete.cc | 20 +-
src/tir/ir/stmt_functor.cc | 32 +-
src/tir/schedule/analysis.h | 10 +
src/tir/schedule/analysis/analysis.cc | 29 ++
src/tir/schedule/primitive.h | 4 +-
src/tir/schedule/primitive/cache_read_write.cc | 323 ++++++++++---
src/tir/schedule/primitive/compute_at.cc | 1 +
src/tir/schedule/primitive/compute_inline.cc | 21 +-
.../schedule/primitive/layout_transformation.cc | 92 ++--
src/tir/schedule/transform.cc | 78 +++-
src/tir/transforms/default_gpu_schedule.cc | 22 +-
src/tir/transforms/flatten_buffer.cc | 14 +-
src/tir/transforms/inject_double_buffer.cc | 4 +-
src/tir/transforms/ir_utils.cc | 3 +-
src/tir/transforms/lift_thread_binding.cc | 195 ++++++++
src/tir/transforms/lower_cross_thread_reduction.cc | 158 ++++++-
src/tir/transforms/lower_custom_datatypes.cc | 2 +-
src/tir/transforms/lower_thread_allreduce.cc | 2 +-
src/tir/transforms/lower_tvm_builtin.cc | 2 +
src/tir/transforms/lower_warp_memory.cc | 2 +-
src/tir/transforms/make_unpacked_api.cc | 11 +-
src/tir/transforms/simplify.cc | 21 +-
src/tir/transforms/storage_flatten.cc | 7 +-
src/tir/transforms/transform_mma_buffer_layout.cc | 6 +-
src/tir/transforms/update_pointer_storage_scope.cc | 9 +-
tests/python/contrib/test_ethosu/test_networks.py | 4 +-
.../contrib/test_ethosu/test_replace_conv2d.py | 14 +-
tests/python/contrib/test_ethosu/test_vela_api.py | 50 +++
.../test_relax_2d_buffer_allocation.py | 91 ++++
tests/python/dlight/test_gpu_decode_gemv.py | 499 +++++++++++++++++++++
tests/python/dlight/test_gpu_fallback.py | 15 +-
tests/python/dlight/test_gpu_matmul.py | 252 +++++++++++
tests/python/dlight/test_gpu_reduction.py | 139 +++---
tests/python/frontend/keras/test_forward.py | 21 +
.../relax/test_analysis_estimate_memory_usage.py | 5 +-
tests/python/relax/test_expr_functor.py | 75 +++-
.../test_transform_legalize_ops_manipulate.py | 117 +++++
.../test_transform_static_plan_block_memory.py | 6 +-
tests/python/relax/test_tvmscript_parser.py | 2 +-
.../relax/test_vm_alloc_storage_with_scope.py | 74 +++
tests/python/relax/test_vm_codegen_only.py | 6 +-
tests/python/relax/test_vm_cuda_graph.py | 6 +-
tests/python/relax/test_vm_execbuilder.py | 8 +-
tests/python/relay/test_expr_functor.py | 2 +-
tests/python/relay/test_op_qnn_dequantize.py | 35 +-
tests/python/relay/test_op_qnn_quantize.py | 23 +
.../python/unittest/test_arith_const_int_bound.py | 17 +
.../python/unittest/test_arith_iter_affine_map.py | 4 +-
...e_postproc_rewrite_parallel_vectorize_unroll.py | 2 -
.../test_meta_schedule_relay_integration.py | 20 +-
...meta_schedule_schedule_cuda_layout_transform.py | 6 +-
.../test_meta_schedule_schedule_rule_mlt_tc.py | 19 +-
.../unittest/test_meta_schedule_trace_apply.py | 10 +-
.../python/unittest/test_meta_schedule_tune_tir.py | 5 +
tests/python/unittest/test_runtime_rpc.py | 31 ++
...sform_layout.py => test_te_transform_layout.py} | 0
.../test_tir_analysis_verify_well_formed.py | 1 +
.../{test_index_map.py => test_tir_index_map.py} | 11 +-
.../python/unittest/test_tir_lower_match_buffer.py | 11 +-
.../unittest/test_tir_schedule_cache_read_write.py | 156 ++++++-
.../unittest/test_tir_schedule_compute_at.py | 69 +++
.../unittest/test_tir_schedule_compute_inline.py | 24 +
.../unittest/test_tir_schedule_transform_layout.py | 112 ++++-
.../test_tir_transform_inject_ptx_async_copy.py | 97 ++--
.../test_tir_transform_lift_thread_binding.py | 139 ++++++
...t_tir_transform_lower_cross_thread_reduction.py | 174 +++++++
.../test_tir_transform_lower_tvm_builtin.py | 6 +-
...test_tir_transform_memhammer_lower_auto_copy.py | 8 -
.../python/unittest/test_tir_transform_simplify.py | 24 +
.../test_tir_transform_unify_thread_binding.py | 43 ++
.../test_transform_default_gpu_schedule.py | 89 ++--
tests/python/unittest/test_tvmscript_complete.py | 4 +
tests/python/unittest/test_tvmscript_roundtrip.py | 17 +
tests/scripts/release/README.md | 2 +
tests/scripts/release/gather_prs.py | 22 +-
tests/scripts/release/make_notes.py | 12 +-
.../scripts/task_config_build_minimal_cross_isa.sh | 1 +
tests/scripts/task_python_docs.sh | 4 +-
version.py | 2 +-
web/package.json | 2 +-
web/src/runtime.ts | 173 ++++---
172 files changed, 5421 insertions(+), 953 deletions(-)
create mode 100644 python/tvm/dlight/gpu/decode_gemv.py
create mode 100644 python/tvm/dlight/gpu/matmul.py
create mode 100644 python/tvm/dlight/gpu/utils.py
rename python/tvm/testing/{rpc_run.py => runner.py} (66%)
create mode 100644 src/tir/transforms/lift_thread_binding.cc
create mode 100644
tests/python/contrib/test_hexagon/test_relax_2d_buffer_allocation.py
create mode 100644 tests/python/dlight/test_gpu_decode_gemv.py
create mode 100644 tests/python/dlight/test_gpu_matmul.py
create mode 100644 tests/python/relax/test_vm_alloc_storage_with_scope.py
rename tests/python/unittest/{test_transform_layout.py =>
test_te_transform_layout.py} (100%)
rename tests/python/unittest/{test_index_map.py => test_tir_index_map.py} (97%)
create mode 100644
tests/python/unittest/test_tir_transform_lift_thread_binding.py