This is an automated email from the ASF dual-hosted git repository.
tqchen pushed a change to branch unity-staging
in repository https://gitbox.apache.org/repos/asf/tvm.git
from 9eafe7c022 [MERGE] Merge main into unity 2023-06-13
add 4f903d4f30 [MERGE] Fix after merge
add 990c768e9c [Unity][Analysis] Reshape TIR detection with
iter-map-simplify (#15099)
add f6050de6f3 [MERGE] recover cutlass in unity
add c68a00a37d [Unity][FuseTIR] Flatten and add tuple fields to parameters
/ arguments only when they are used (#15113)
add 16158e7cc3 [Unity][Relax][UX] Specify function purity in the
@R.function decorator (#15109)
add 121bca6027 [Unity][Relax] Make RewriteDataflowReshape only rewrite
volume-preserving ops (#15112)
add 2260bba731 [Unity] Allocate workspace for all functions (#15118)
add 84ce484650 [Unity][BYOC] Integrate fp16 A - int4 B GEMM kernel from
FasterTransformer into CUTLASS BYOC (#15111)
add 0a672548e8 [Unity] Minor fix to `RewriteDataflowReshape` condition
(#15125)
add cbdee2ff34 [Unity] Hotfix webgpu runtime (#15135)
add 6587faaec0 [Unity][Relax] Generalize CSE to work outside
DataflowBlocks (#15047)
add cb451e27d0 [Unity] Fix handling of vm builtins in cuda graph (#15145)
add 6e2889f3c5 [Unity][Bugfix] Fix purity annotation in CSE test (#15143)
add 7d47f7b92d [Unity] Scaffolding DLight (#15141)
add 7c3369c72f [Unity][Pass] FuseOps with partially accessed Tuple param
(#15152)
add bd24133fce [TIR][Compute-at] Enable complex floordiv/floormod
expressions in compute_at (#14854)
add 081cc2ef64 [Bugfix][Relay][Keras] Fix UpSampling2D about the wrong
assertion about size (#15082)
add 7842de3aab [Docker] Migrate arm docker image to use llvm packages
(#15067)
add 1ca444ebb9 [Docker] Add build.sh environment variables (#15095)
add 62a5e7acf5 [Relay] Improve the "clip" op optimization in simplify expr
pass (#15068)
add 317ec5209c [Docker] Update ci-cortexm docker image to contain CMSIS-NN
release v… (#15092)
add 90b5acca59 [Bugfix][Relay][Keras] Fix the wrong implementation logic
about cropping2D (#15053)
add 02136b393d [Target][rocm] Replace rocm arch parsing from int to string
(#15088)
add e6c9d69aaf [QNN] Implement quantized avg_pool2d (#15057)
add dd6fcccc45 [TIR] Update primfunc host attachment to include host
(#15102)
add 6ef22f5631 [AOT] Avoid Var-to-Var Let binding in AOTExecutorCodegen
(#15033)
add 0c09547c76 [TIR][CodeGen] Define PackedFunc error code in
MakePackedAPI (#15076)
add d05715daed [TIR][TVMScript] Convert tir.op operands to PrimExpr
(#15091)
add 7767de9c39 [Docker] Update docker images for llvm-16 (#15105)
add 2e441ca7ec [Bugfix][Relay][Keras] Fix a wrong variable name in keras
frontend (#15107)
add 0ed6fd6e8c [TIR] Handle DeclBuffer in RemoveNoOp (#15096)
add e0dbc8773a [TIR] Handle DeclBuffer in InjectDoubleBuffer (#15045)
add cd6d7e8c07 [TIR] Handle DeclBuffer in StorageAccessInfoLower (#15093)
add 64f9b12773 [TIR] Handle DeclBuffer in
MergeDynamicSharedMemoryAllocations (#15094)
add 3f2aa6817d [TIR] Handle DeclBuffer in LowerThreadAllreduce (#15078)
add fa8a9f7aaa [TIR][USMP] Preserve DeclBuffer in
PoolAllocationToOffsetConverter (#15044)
add 7c4c913599 [Ethos-U][TIR] Handle DeclBuffer in Ethos-U inputs (#15098)
add dcf1b1a90e [TIR] Handle DeclBuffer in CacheReadWrite schedule
primitive (#15037)
add 77c75d1ef3 [TVMScript] Avoid visiting repetition tensor in
SetCommonPrefix Visitor (#15083)
add 95c171458e [RPC] Add Missing Option "port_end" to RPC Proxy (#15116)
add 0c37f5890e [#15043][Docs] Updated the copyright year from 2020 to 2023
(#15071)
add fa223b077f [Bugfix][TVMC] Fix tvmc option for printing which operators
are offloaded to the Ethos-U (#14994)
add e280e01fc1 [microNPU][ETHOSU] Fix CopyComputeReordering pass arguments
(#15063)
add 31be72635e [microNPU][ETHOSU] Fix minimum buffer size (#15104)
add 54b9741934 [Bugfix][Relay][Keras] Fix SeparableConv2D conversion in
dilation_rate attribute (#15122)
add b37ad17ce6 [Arith][TIR] Recognize empty extents (#15129)
add c1c6d93b09 [ARITH] NormalizeToIterSum (#15120)
add 7247ddcf37 [ARITH] Hotfix flaky test in padded matmul (#15131)
add 3b26ce21c0 [TIR] Avoid duplicate GlobalVar names in SplitHostDevice
(#15119)
add 6b20caee2d [Bugfix] [Relay] Insertion of "device_copy" CallNode to
Resolve Device Conflict on Unconstrained Nodes (#15090)
add 45c654c573 [NDArray] Allow creating a view from a strided array
(#15132)
add 2c48d7bf21 [Test] Improve check for TVMError exception in test_cast
(#15138)
add bee073b0c8 [LLVM] Minor refactor to LLVMModuleNode::SaveToFile (#15139)
add 7392432c08 [LLVM] Remove the "ret_void" argument of AddFunction
(#15127)
add 478b26c246 [microNPU] Upgrade Vela to v3.8.0 (#15114)
add 1257f4300d [microNPU][ETHOSU] Upgrade to 23.05 version of Arm(R)
Ethos(TM)-U NPU drivers (#15115)
add d26dc445bb [TIR] Block dependence analysis without schedules (#15146)
add 0a5f5f030f [TensorIR][Schedule] New schedule primitive
`unsafe_hide_buffer_access` (#15144)
add 8c297a63c5 [Docker] Fix build.sh environment variables (#15149)
add 512d35ab11 [TIR] Fix typo in code example (#15150)
add 5a3523d383 [TIR][Schedule] Enhance `compute-inline` for fusion (#15142)
new 01ea438e6e [MERGE] Merge main to unity 2023-06-24
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
3rdparty/cutlass_fpA_intB_gemm | 2 +-
NOTICE | 2 +-
apps/microtvm/cmsisnn/requirements.txt | 4 +-
apps/microtvm/ethosu/requirements.txt | 4 +-
ci/jenkins/docker-images.ini | 6 +-
ci/jenkins/unity_jenkinsfile.groovy | 2 +-
cmake/modules/contrib/CUTLASS.cmake | 1 +
docker/Dockerfile.ci_arm | 4 +-
docker/build.sh | 12 +
.../install/ubuntu_install_ethosu_driver_stack.sh | 2 +-
docker/install/ubuntu_install_vela.sh | 2 +-
docs/conf.py | 6 +-
gallery/how_to/work_with_microtvm/micro_ethosu.py | 2 +-
include/tvm/arith/iter_affine_map.h | 19 +
include/tvm/relax/transform.h | 5 +-
include/tvm/script/ir_builder/relax/ir.h | 9 +-
include/tvm/tir/block_dependence_info.h | 102 ++++
include/tvm/tir/block_scope.h | 50 ++
include/tvm/tir/schedule/schedule.h | 9 +
include/tvm/tir/utils.h | 45 ++
python/gen_requirements.py | 2 +-
python/tvm/arith/__init__.py | 1 +
python/tvm/arith/iter_affine_map.py | 31 ++
python/tvm/contrib/cutlass/build.py | 53 +-
python/tvm/contrib/cutlass/gemm_operation.py | 59 +++
python/tvm/contrib/cutlass/gen_tensor_op.py | 29 +-
.../_template => python/tvm/dlight}/__init__.py | 9 +-
.../tvm/{_ffi/_ctypes => dlight/base}/__init__.py | 4 +-
python/tvm/dlight/base/schedule_rule.py | 105 ++++
python/tvm/dlight/base/transform.py | 78 +++
.../tvm/dlight/gpu}/__init__.py | 7 +-
python/tvm/dlight/gpu/fallback.py | 91 ++++
python/tvm/driver/tvmc/compiler.py | 44 +-
python/tvm/exec/rpc_proxy.py | 10 +-
.../tvm/meta_schedule/testing/space_generation.py | 2 +-
python/tvm/relax/backend/contrib/cutlass.py | 118 ++++-
python/tvm/relax/transform/transform.py | 7 +-
.../tvm/relay/analysis/operations_distribution.py | 38 +-
.../relay/backend/contrib/ethosu/tir/compiler.py | 7 +-
.../tvm/relay/backend/contrib/ethosu/tir/passes.py | 12 +-
.../backend/contrib/ethosu/tir_to_cs_translator.py | 8 +-
.../tvm/relay/backend/contrib/ethosu/vela_api.py | 15 +-
python/tvm/relay/frontend/keras.py | 22 +-
python/tvm/relay/qnn/op/_qnn.py | 3 +
python/tvm/relay/qnn/op/layout_conversions.py | 35 ++
python/tvm/relay/qnn/op/qnn.py | 66 +++
python/tvm/relay/qnn/strategy/generic.py | 39 ++
python/tvm/relay/qnn/strategy/hexagon.py | 24 +
.../transform/fake_quantization_to_integer.py | 57 +-
python/tvm/relay/transform/suffixes.py | 4 +-
python/tvm/script/ir_builder/relax/ir.py | 28 +-
python/tvm/script/parser/relax/entry.py | 33 +-
python/tvm/script/parser/relax/parser.py | 24 +-
python/tvm/te/operation.py | 2 +-
python/tvm/tir/__init__.py | 1 +
python/tvm/tir/block_dependence_info.py | 88 ++++
python/tvm/tir/op.py | 38 ++
python/tvm/tir/schedule/schedule.py | 31 ++
python/tvm/topi/hexagon/compute_poolarea.py | 143 +++++
python/tvm/topi/hexagon/qnn/__init__.py | 2 +-
python/tvm/topi/hexagon/qnn/avg_pool2d.py | 408 +++++++++++----
python/tvm/topi/hexagon/qnn/nn.py | 26 +-
python/tvm/topi/hexagon/slice_ops/__init__.py | 2 +-
python/tvm/topi/hexagon/slice_ops/avg_pool2d.py | 224 +++++---
python/tvm/topi/hexagon/utils.py | 65 ++-
src/arith/int_set.cc | 6 +
src/arith/iter_affine_map.cc | 125 ++++-
src/arith/product_normal_form.h | 9 +-
src/relax/analysis/tir_op_pattern_kind.cc | 69 ++-
src/relax/transform/allocate_workspace.cc | 16 +-
src/relax/transform/eliminate_common_subexpr.cc | 118 +++--
src/relax/transform/fuse_ops.cc | 74 ++-
src/relax/transform/fuse_tir.cc | 86 ++-
src/relax/transform/rewrite_cuda_graph.cc | 56 +-
src/relax/transform/rewrite_dataflow_reshape.cc | 56 +-
src/relay/backend/aot_executor_codegen.cc | 29 +-
src/relay/op/nn/pooling.cc | 38 +-
src/relay/op/nn/pooling.h | 4 +-
src/relay/op/nn/pooling_common.h | 78 +++
src/relay/qnn/op/avg_pool2d.cc | 223 ++++++++
src/relay/transforms/device_planner.cc | 227 +++++++-
src/relay/transforms/simplify_expr.cc | 54 +-
src/runtime/contrib/cutlass/weight_preprocess.cc | 55 ++
src/runtime/ndarray.cc | 25 +-
src/script/ir_builder/relax/ir.cc | 13 +-
src/script/printer/ir_docsifier.cc | 8 +
src/script/printer/relax/function.cc | 10 +-
src/script/printer/utils.h | 2 +-
src/target/llvm/codegen_amdgpu.cc | 2 +-
src/target/llvm/codegen_llvm.cc | 22 +-
src/target/llvm/codegen_llvm.h | 4 +-
src/target/llvm/codegen_nvptx.cc | 4 +-
src/target/llvm/llvm_module.cc | 78 +--
src/target/source/codegen_c.cc | 6 +-
src/target/source/codegen_c.h | 4 -
src/target/source/codegen_c_host.cc | 5 -
src/target/source/codegen_c_host.h | 1 -
src/target/spirv/codegen_spirv.cc | 2 -
src/target/spirv/codegen_spirv.h | 1 -
src/target/stackvm/codegen_stackvm.cc | 6 +
src/target/target_kind.cc | 30 +-
src/tir/analysis/var_use_def_analysis.cc | 63 ++-
src/tir/analysis/var_use_def_analysis.h | 13 +-
src/tir/contrib/ethosu/passes.cc | 67 ++-
src/tir/ir/block_dependence_info.cc | 99 ++++
src/tir/ir/block_scope.cc | 48 ++
src/tir/schedule/analysis/analysis.cc | 12 +
src/tir/schedule/concrete_schedule.cc | 8 +
src/tir/schedule/concrete_schedule.h | 2 +
src/tir/schedule/primitive.h | 10 +
src/tir/schedule/primitive/cache_read_write.cc | 59 ++-
src/tir/schedule/primitive/compute_at.cc | 46 +-
src/tir/schedule/primitive/compute_inline.cc | 262 ++++++----
src/tir/schedule/primitive/hide_buffer_access.cc | 171 ++++++
src/tir/schedule/schedule.cc | 2 +
src/tir/schedule/state.cc | 158 +-----
src/tir/schedule/traced_schedule.cc | 11 +
src/tir/schedule/traced_schedule.h | 2 +
src/tir/transforms/flatten_buffer.cc | 119 +++--
src/tir/transforms/inject_double_buffer.cc | 24 +-
.../transforms/lower_device_storage_access_info.cc | 10 +
src/tir/transforms/lower_thread_allreduce.cc | 108 ++--
src/tir/transforms/make_packed_api.cc | 9 +-
src/tir/transforms/make_unpacked_api.cc | 4 +-
.../merge_dynamic_shared_memory_allocations.cc | 9 +
src/tir/transforms/primfunc_utils.cc | 3 +-
src/tir/transforms/remove_no_op.cc | 14 +
src/tir/transforms/split_host_device.cc | 36 +-
src/tir/transforms/update_pointer_storage_scope.cc | 5 +
src/tir/transforms/update_pointer_storage_scope.h | 1 +
.../convert_pool_allocations_to_offsets.cc | 11 +
.../contrib/test_ethosu/test_encode_constants.py | 102 ++--
tests/python/contrib/test_ethosu/test_networks.py | 6 +-
.../test_pass_operations_distribution.py | 48 +-
.../test_ethosu/test_tir_to_cs_translator.py | 35 ++
.../test_hexagon/test_qnn_op_integration.py | 576 +++++++++++++++++++++
.../test_hexagon/test_wo_qnn_canonicalization.py | 545 -------------------
.../topi/slice_op/test_avg_pool2d_slice.py | 412 +++++++--------
tests/python/dlight/test_schedule_rule.py | 71 +++
tests/python/driver/tvmc/test_compiler.py | 123 +++--
tests/python/frontend/keras/test_forward.py | 17 +-
tests/python/relax/test_analysis.py | 122 +++++
.../relax/test_analysis_contains_impure_call.py | 9 +-
tests/python/relax/test_ast_printer.py | 3 +-
tests/python/relax/test_codegen_cutlass.py | 238 +++++++++
tests/python/relax/test_pipeline.py | 3 +-
tests/python/relax/test_relax_operators.py | 21 +-
tests/python/relax/test_transform.py | 20 +-
.../relax/test_transform_allocate_workspace.py | 36 +-
tests/python/relax/test_transform_cse.py | 61 ++-
tests/python/relax/test_transform_fuse_ops.py | 62 ++-
tests/python/relax/test_transform_fuse_tir.py | 119 +++++
tests/python/relax/test_transform_lambda_lift.py | 13 +-
.../relax/test_transform_rewrite_cuda_graph.py | 98 ++++
.../test_transform_rewrite_dataflow_reshape.py | 81 ++-
.../test_transform_static_plan_block_memory.py | 6 +-
tests/python/relax/test_tvmscript_parser.py | 80 ++-
tests/python/relax/test_tvmscript_printer_relax.py | 12 +-
tests/python/relay/test_pass_plan_devices.py | 47 ++
tests/python/relay/test_pass_simplify_expr.py | 82 ++-
.../python/unittest/test_arith_iter_affine_map.py | 68 +++
.../test_meta_schedule_schedule_rule_mlt_intrin.py | 3 +-
.../test_meta_schedule_schedule_rule_mlt_tc.py | 25 +-
.../test_meta_schedule_space_cuda_winograd.py | 57 +-
tests/python/unittest/test_runtime_dlpack.py | 13 +
..._scope.py => test_tir_block_dependence_info.py} | 173 +++----
tests/python/unittest/test_tir_host_func.py | 3 +-
tests/python/unittest/test_tir_nodes.py | 9 +-
.../python/unittest/test_tir_schedule_analysis.py | 28 +
.../unittest/test_tir_schedule_cache_read_write.py | 134 ++---
.../unittest/test_tir_schedule_compute_at.py | 52 ++
.../unittest/test_tir_schedule_compute_inline.py | 320 +++++++++++-
.../test_tir_schedule_tensorize_ldmatrix_mma.py | 24 +-
.../test_tir_transform_inject_double_buffer.py | 88 +++-
...r_transform_lower_device_storage_access_info.py | 118 +++++
.../test_tir_transform_lower_thread_all_reduce.py | 239 +++++++++
.../test_tir_transform_lower_tvm_builtin.py | 9 +-
.../unittest/test_tir_transform_make_packed_api.py | 15 +-
.../test_tir_transform_make_unpacked_api.py | 6 +
...form_merge_dynamic_shared_memory_allocations.py | 133 ++++-
.../unittest/test_tir_transform_remove_no_op.py | 16 +
.../test_tir_transform_split_host_device.py | 51 ++
.../unittest/test_tir_unsafe_hide_buffer_access.py | 102 ++++
...ransform_convert_pool_allocations_to_offsets.py | 255 +++++----
tests/python/unittest/test_tvmscript_roundtrip.py | 58 +++
tests/scripts/task_config_build_arm.sh | 2 +-
tests/scripts/unity/task_python_relax.sh | 1 +
web/emcc/webgpu_runtime.cc | 4 -
188 files changed, 7981 insertions(+), 2388 deletions(-)
create mode 100644 include/tvm/tir/block_dependence_info.h
copy {apps/uma/_template => python/tvm/dlight}/__init__.py (82%)
copy python/tvm/{_ffi/_ctypes => dlight/base}/__init__.py (88%)
create mode 100644 python/tvm/dlight/base/schedule_rule.py
create mode 100644 python/tvm/dlight/base/transform.py
copy {apps/uma/_template => python/tvm/dlight/gpu}/__init__.py (84%)
create mode 100644 python/tvm/dlight/gpu/fallback.py
create mode 100644 python/tvm/tir/block_dependence_info.py
create mode 100644 python/tvm/topi/hexagon/compute_poolarea.py
create mode 100644 src/relay/op/nn/pooling_common.h
create mode 100644 src/relay/qnn/op/avg_pool2d.cc
create mode 100644 src/runtime/contrib/cutlass/weight_preprocess.cc
create mode 100644 src/tir/ir/block_dependence_info.cc
create mode 100644 src/tir/schedule/primitive/hide_buffer_access.cc
create mode 100644 tests/python/contrib/test_hexagon/test_qnn_op_integration.py
delete mode 100644
tests/python/contrib/test_hexagon/test_wo_qnn_canonicalization.py
create mode 100644 tests/python/dlight/test_schedule_rule.py
copy tests/python/unittest/{test_tir_schedule_block_scope.py =>
test_tir_block_dependence_info.py} (51%)
create mode 100644
tests/python/unittest/test_tir_transform_lower_device_storage_access_info.py
create mode 100644
tests/python/unittest/test_tir_transform_lower_thread_all_reduce.py
create mode 100644 tests/python/unittest/test_tir_unsafe_hide_buffer_access.py