This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch unity-staging
in repository https://gitbox.apache.org/repos/asf/tvm.git


    from 9eafe7c022 [MERGE] Merge main into unity 2023-06-13
     add 4f903d4f30 [MERGE] Fix after merge
     add 990c768e9c [Unity][Analysis] Reshape TIR detection with 
iter-map-simplify (#15099)
     add f6050de6f3 [MERGE] recover cutlass in unity
     add c68a00a37d [Unity][FuseTIR] Flatten and add tuple fields to parameters 
/ arguments only when they are used (#15113)
     add 16158e7cc3 [Unity][Relax][UX] Specify function purity in the 
@R.function decorator (#15109)
     add 121bca6027 [Unity][Relax] Make RewriteDataflowReshape only rewrite 
volume-preserving ops (#15112)
     add 2260bba731 [Unity] Allocate workspace for all functions (#15118)
     add 84ce484650 [Unity][BYOC] Integrate fp16 A - int4 B GEMM kernel from 
FasterTransformer into CUTLASS BYOC  (#15111)
     add 0a672548e8 [Unity] Minor fix to `RewriteDataflowReshape` condition  
(#15125)
     add cbdee2ff34 [Unity] Hotfix webgpu runtime (#15135)
     add 6587faaec0 [Unity][Relax] Generalize CSE to work outside 
DataflowBlocks (#15047)
     add cb451e27d0 [Unity] Fix handling of vm builtins in cuda graph (#15145)
     add 6e2889f3c5 [Unity][Bugfix] Fix purity annotation in CSE test (#15143)
     add 7d47f7b92d [Unity] Scaffolding DLight (#15141)
     add 7c3369c72f [Unity][Pass] FuseOps with partially accessed Tuple param 
(#15152)
     add bd24133fce [TIR][Compute-at] Enable complex floordiv/floormod 
expressions in compute_at (#14854)
     add 081cc2ef64 [Bugfix][Relay][Keras] Fix UpSampling2D about the wrong 
assertion about size (#15082)
     add 7842de3aab [Docker] Migrate arm docker image to use llvm packages 
(#15067)
     add 1ca444ebb9 [Docker] Add build.sh environment variables (#15095)
     add 62a5e7acf5 [Relay] Improve the "clip" op optimization in simplify expr 
pass (#15068)
     add 317ec5209c [Docker] Update ci-cortexm docker image to contain CMSIS-NN 
release v… (#15092)
     add 90b5acca59 [Bugfix][Relay][Keras] Fix the wrong implementation logic 
about cropping2D (#15053)
     add 02136b393d [Target][rocm] Replace rocm arch parsing from int to string 
(#15088)
     add e6c9d69aaf [QNN] Implement quantized avg_pool2d (#15057)
     add dd6fcccc45 [TIR] Update primfunc host attachment to include host 
(#15102)
     add 6ef22f5631 [AOT] Avoid Var-to-Var Let binding in AOTExecutorCodegen 
(#15033)
     add 0c09547c76 [TIR][CodeGen] Define PackedFunc error code in 
MakePackedAPI (#15076)
     add d05715daed [TIR][TVMScript] Convert tir.op operands to PrimExpr 
(#15091)
     add 7767de9c39 [Docker] Update docker images for llvm-16 (#15105)
     add 2e441ca7ec [Bugfix][Relay][Keras] Fix a wrong variable name in keras 
frontend (#15107)
     add 0ed6fd6e8c [TIR] Handle DeclBuffer in RemoveNoOp (#15096)
     add e0dbc8773a [TIR] Handle DeclBuffer in InjectDoubleBuffer (#15045)
     add cd6d7e8c07 [TIR] Handle DeclBuffer in StorageAccessInfoLower (#15093)
     add 64f9b12773 [TIR] Handle DeclBuffer in 
MergeDynamicSharedMemoryAllocations (#15094)
     add 3f2aa6817d [TIR] Handle DeclBuffer in LowerThreadAllreduce (#15078)
     add fa8a9f7aaa [TIR][USMP] Preserve DeclBuffer in 
PoolAllocationToOffsetConverter (#15044)
     add 7c4c913599 [Ethos-U][TIR] Handle DeclBuffer in Ethos-U inputs (#15098)
     add dcf1b1a90e [TIR] Handle DeclBuffer in CacheReadWrite schedule 
primitive (#15037)
     add 77c75d1ef3 [TVMScript] Avoid visiting repetition tensor in 
SetCommonPrefix Visitor (#15083)
     add 95c171458e [RPC] Add Missing Option "port_end" to RPC Proxy (#15116)
     add 0c37f5890e [#15043][Docs] Updated the copyright year from 2020 to 2023 
(#15071)
     add fa223b077f [Bugfix][TVMC] Fix tvmc option for printing which operators 
are offloaded to the Ethos-U (#14994)
     add e280e01fc1 [microNPU][ETHOSU] Fix CopyComputeReordering pass arguments 
(#15063)
     add 31be72635e [microNPU][ETHOSU] Fix minimum buffer size (#15104)
     add 54b9741934 [Bugfix][Relay][Keras] Fix SeparableConv2D conversion in 
dilation_rate attribute (#15122)
     add b37ad17ce6 [Arith][TIR] Recognize empty extents (#15129)
     add c1c6d93b09 [ARITH] NormalizeToIterSum (#15120)
     add 7247ddcf37 [ARITH] Hotfix flaky test in padded matmul (#15131)
     add 3b26ce21c0 [TIR] Avoid duplicate GlobalVar names in SplitHostDevice 
(#15119)
     add 6b20caee2d [Bugfix] [Relay] Insertion of "device_copy" CallNode to 
Resolve Device Conflict on Unconstrained Nodes (#15090)
     add 45c654c573 [NDArray] Allow creating a view from a strided array 
(#15132)
     add 2c48d7bf21 [Test] Improve check for TVMError exception in test_cast 
(#15138)
     add bee073b0c8 [LLVM] Minor refactor to LLVMModuleNode::SaveToFile (#15139)
     add 7392432c08 [LLVM] Remove the "ret_void" argument of AddFunction 
(#15127)
     add 478b26c246 [microNPU] Upgrade Vela to v3.8.0 (#15114)
     add 1257f4300d [microNPU][ETHOSU] Upgrade to 23.05 version of Arm(R) 
Ethos(TM)-U NPU drivers (#15115)
     add d26dc445bb [TIR] Block dependence analysis without schedules (#15146)
     add 0a5f5f030f [TensorIR][Schedule] New schedule primitive 
`unsafe_hide_buffer_access` (#15144)
     add 8c297a63c5 [Docker] Fix build.sh environment variables (#15149)
     add 512d35ab11 [TIR] Fix typo in code example (#15150)
     add 5a3523d383 [TIR][Schedule] Enhance `compute-inline` for fusion (#15142)
     new 01ea438e6e [MERGE] Merge main to unity 2023-06-24

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 3rdparty/cutlass_fpA_intB_gemm                     |   2 +-
 NOTICE                                             |   2 +-
 apps/microtvm/cmsisnn/requirements.txt             |   4 +-
 apps/microtvm/ethosu/requirements.txt              |   4 +-
 ci/jenkins/docker-images.ini                       |   6 +-
 ci/jenkins/unity_jenkinsfile.groovy                |   2 +-
 cmake/modules/contrib/CUTLASS.cmake                |   1 +
 docker/Dockerfile.ci_arm                           |   4 +-
 docker/build.sh                                    |  12 +
 .../install/ubuntu_install_ethosu_driver_stack.sh  |   2 +-
 docker/install/ubuntu_install_vela.sh              |   2 +-
 docs/conf.py                                       |   6 +-
 gallery/how_to/work_with_microtvm/micro_ethosu.py  |   2 +-
 include/tvm/arith/iter_affine_map.h                |  19 +
 include/tvm/relax/transform.h                      |   5 +-
 include/tvm/script/ir_builder/relax/ir.h           |   9 +-
 include/tvm/tir/block_dependence_info.h            | 102 ++++
 include/tvm/tir/block_scope.h                      |  50 ++
 include/tvm/tir/schedule/schedule.h                |   9 +
 include/tvm/tir/utils.h                            |  45 ++
 python/gen_requirements.py                         |   2 +-
 python/tvm/arith/__init__.py                       |   1 +
 python/tvm/arith/iter_affine_map.py                |  31 ++
 python/tvm/contrib/cutlass/build.py                |  53 +-
 python/tvm/contrib/cutlass/gemm_operation.py       |  59 +++
 python/tvm/contrib/cutlass/gen_tensor_op.py        |  29 +-
 .../_template => python/tvm/dlight}/__init__.py    |   9 +-
 .../tvm/{_ffi/_ctypes => dlight/base}/__init__.py  |   4 +-
 python/tvm/dlight/base/schedule_rule.py            | 105 ++++
 python/tvm/dlight/base/transform.py                |  78 +++
 .../tvm/dlight/gpu}/__init__.py                    |   7 +-
 python/tvm/dlight/gpu/fallback.py                  |  91 ++++
 python/tvm/driver/tvmc/compiler.py                 |  44 +-
 python/tvm/exec/rpc_proxy.py                       |  10 +-
 .../tvm/meta_schedule/testing/space_generation.py  |   2 +-
 python/tvm/relax/backend/contrib/cutlass.py        | 118 ++++-
 python/tvm/relax/transform/transform.py            |   7 +-
 .../tvm/relay/analysis/operations_distribution.py  |  38 +-
 .../relay/backend/contrib/ethosu/tir/compiler.py   |   7 +-
 .../tvm/relay/backend/contrib/ethosu/tir/passes.py |  12 +-
 .../backend/contrib/ethosu/tir_to_cs_translator.py |   8 +-
 .../tvm/relay/backend/contrib/ethosu/vela_api.py   |  15 +-
 python/tvm/relay/frontend/keras.py                 |  22 +-
 python/tvm/relay/qnn/op/_qnn.py                    |   3 +
 python/tvm/relay/qnn/op/layout_conversions.py      |  35 ++
 python/tvm/relay/qnn/op/qnn.py                     |  66 +++
 python/tvm/relay/qnn/strategy/generic.py           |  39 ++
 python/tvm/relay/qnn/strategy/hexagon.py           |  24 +
 .../transform/fake_quantization_to_integer.py      |  57 +-
 python/tvm/relay/transform/suffixes.py             |   4 +-
 python/tvm/script/ir_builder/relax/ir.py           |  28 +-
 python/tvm/script/parser/relax/entry.py            |  33 +-
 python/tvm/script/parser/relax/parser.py           |  24 +-
 python/tvm/te/operation.py                         |   2 +-
 python/tvm/tir/__init__.py                         |   1 +
 python/tvm/tir/block_dependence_info.py            |  88 ++++
 python/tvm/tir/op.py                               |  38 ++
 python/tvm/tir/schedule/schedule.py                |  31 ++
 python/tvm/topi/hexagon/compute_poolarea.py        | 143 +++++
 python/tvm/topi/hexagon/qnn/__init__.py            |   2 +-
 python/tvm/topi/hexagon/qnn/avg_pool2d.py          | 408 +++++++++++----
 python/tvm/topi/hexagon/qnn/nn.py                  |  26 +-
 python/tvm/topi/hexagon/slice_ops/__init__.py      |   2 +-
 python/tvm/topi/hexagon/slice_ops/avg_pool2d.py    | 224 +++++---
 python/tvm/topi/hexagon/utils.py                   |  65 ++-
 src/arith/int_set.cc                               |   6 +
 src/arith/iter_affine_map.cc                       | 125 ++++-
 src/arith/product_normal_form.h                    |   9 +-
 src/relax/analysis/tir_op_pattern_kind.cc          |  69 ++-
 src/relax/transform/allocate_workspace.cc          |  16 +-
 src/relax/transform/eliminate_common_subexpr.cc    | 118 +++--
 src/relax/transform/fuse_ops.cc                    |  74 ++-
 src/relax/transform/fuse_tir.cc                    |  86 ++-
 src/relax/transform/rewrite_cuda_graph.cc          |  56 +-
 src/relax/transform/rewrite_dataflow_reshape.cc    |  56 +-
 src/relay/backend/aot_executor_codegen.cc          |  29 +-
 src/relay/op/nn/pooling.cc                         |  38 +-
 src/relay/op/nn/pooling.h                          |   4 +-
 src/relay/op/nn/pooling_common.h                   |  78 +++
 src/relay/qnn/op/avg_pool2d.cc                     | 223 ++++++++
 src/relay/transforms/device_planner.cc             | 227 +++++++-
 src/relay/transforms/simplify_expr.cc              |  54 +-
 src/runtime/contrib/cutlass/weight_preprocess.cc   |  55 ++
 src/runtime/ndarray.cc                             |  25 +-
 src/script/ir_builder/relax/ir.cc                  |  13 +-
 src/script/printer/ir_docsifier.cc                 |   8 +
 src/script/printer/relax/function.cc               |  10 +-
 src/script/printer/utils.h                         |   2 +-
 src/target/llvm/codegen_amdgpu.cc                  |   2 +-
 src/target/llvm/codegen_llvm.cc                    |  22 +-
 src/target/llvm/codegen_llvm.h                     |   4 +-
 src/target/llvm/codegen_nvptx.cc                   |   4 +-
 src/target/llvm/llvm_module.cc                     |  78 +--
 src/target/source/codegen_c.cc                     |   6 +-
 src/target/source/codegen_c.h                      |   4 -
 src/target/source/codegen_c_host.cc                |   5 -
 src/target/source/codegen_c_host.h                 |   1 -
 src/target/spirv/codegen_spirv.cc                  |   2 -
 src/target/spirv/codegen_spirv.h                   |   1 -
 src/target/stackvm/codegen_stackvm.cc              |   6 +
 src/target/target_kind.cc                          |  30 +-
 src/tir/analysis/var_use_def_analysis.cc           |  63 ++-
 src/tir/analysis/var_use_def_analysis.h            |  13 +-
 src/tir/contrib/ethosu/passes.cc                   |  67 ++-
 src/tir/ir/block_dependence_info.cc                |  99 ++++
 src/tir/ir/block_scope.cc                          |  48 ++
 src/tir/schedule/analysis/analysis.cc              |  12 +
 src/tir/schedule/concrete_schedule.cc              |   8 +
 src/tir/schedule/concrete_schedule.h               |   2 +
 src/tir/schedule/primitive.h                       |  10 +
 src/tir/schedule/primitive/cache_read_write.cc     |  59 ++-
 src/tir/schedule/primitive/compute_at.cc           |  46 +-
 src/tir/schedule/primitive/compute_inline.cc       | 262 ++++++----
 src/tir/schedule/primitive/hide_buffer_access.cc   | 171 ++++++
 src/tir/schedule/schedule.cc                       |   2 +
 src/tir/schedule/state.cc                          | 158 +-----
 src/tir/schedule/traced_schedule.cc                |  11 +
 src/tir/schedule/traced_schedule.h                 |   2 +
 src/tir/transforms/flatten_buffer.cc               | 119 +++--
 src/tir/transforms/inject_double_buffer.cc         |  24 +-
 .../transforms/lower_device_storage_access_info.cc |  10 +
 src/tir/transforms/lower_thread_allreduce.cc       | 108 ++--
 src/tir/transforms/make_packed_api.cc              |   9 +-
 src/tir/transforms/make_unpacked_api.cc            |   4 +-
 .../merge_dynamic_shared_memory_allocations.cc     |   9 +
 src/tir/transforms/primfunc_utils.cc               |   3 +-
 src/tir/transforms/remove_no_op.cc                 |  14 +
 src/tir/transforms/split_host_device.cc            |  36 +-
 src/tir/transforms/update_pointer_storage_scope.cc |   5 +
 src/tir/transforms/update_pointer_storage_scope.h  |   1 +
 .../convert_pool_allocations_to_offsets.cc         |  11 +
 .../contrib/test_ethosu/test_encode_constants.py   | 102 ++--
 tests/python/contrib/test_ethosu/test_networks.py  |   6 +-
 .../test_pass_operations_distribution.py           |  48 +-
 .../test_ethosu/test_tir_to_cs_translator.py       |  35 ++
 .../test_hexagon/test_qnn_op_integration.py        | 576 +++++++++++++++++++++
 .../test_hexagon/test_wo_qnn_canonicalization.py   | 545 -------------------
 .../topi/slice_op/test_avg_pool2d_slice.py         | 412 +++++++--------
 tests/python/dlight/test_schedule_rule.py          |  71 +++
 tests/python/driver/tvmc/test_compiler.py          | 123 +++--
 tests/python/frontend/keras/test_forward.py        |  17 +-
 tests/python/relax/test_analysis.py                | 122 +++++
 .../relax/test_analysis_contains_impure_call.py    |   9 +-
 tests/python/relax/test_ast_printer.py             |   3 +-
 tests/python/relax/test_codegen_cutlass.py         | 238 +++++++++
 tests/python/relax/test_pipeline.py                |   3 +-
 tests/python/relax/test_relax_operators.py         |  21 +-
 tests/python/relax/test_transform.py               |  20 +-
 .../relax/test_transform_allocate_workspace.py     |  36 +-
 tests/python/relax/test_transform_cse.py           |  61 ++-
 tests/python/relax/test_transform_fuse_ops.py      |  62 ++-
 tests/python/relax/test_transform_fuse_tir.py      | 119 +++++
 tests/python/relax/test_transform_lambda_lift.py   |  13 +-
 .../relax/test_transform_rewrite_cuda_graph.py     |  98 ++++
 .../test_transform_rewrite_dataflow_reshape.py     |  81 ++-
 .../test_transform_static_plan_block_memory.py     |   6 +-
 tests/python/relax/test_tvmscript_parser.py        |  80 ++-
 tests/python/relax/test_tvmscript_printer_relax.py |  12 +-
 tests/python/relay/test_pass_plan_devices.py       |  47 ++
 tests/python/relay/test_pass_simplify_expr.py      |  82 ++-
 .../python/unittest/test_arith_iter_affine_map.py  |  68 +++
 .../test_meta_schedule_schedule_rule_mlt_intrin.py |   3 +-
 .../test_meta_schedule_schedule_rule_mlt_tc.py     |  25 +-
 .../test_meta_schedule_space_cuda_winograd.py      |  57 +-
 tests/python/unittest/test_runtime_dlpack.py       |  13 +
 ..._scope.py => test_tir_block_dependence_info.py} | 173 +++----
 tests/python/unittest/test_tir_host_func.py        |   3 +-
 tests/python/unittest/test_tir_nodes.py            |   9 +-
 .../python/unittest/test_tir_schedule_analysis.py  |  28 +
 .../unittest/test_tir_schedule_cache_read_write.py | 134 ++---
 .../unittest/test_tir_schedule_compute_at.py       |  52 ++
 .../unittest/test_tir_schedule_compute_inline.py   | 320 +++++++++++-
 .../test_tir_schedule_tensorize_ldmatrix_mma.py    |  24 +-
 .../test_tir_transform_inject_double_buffer.py     |  88 +++-
 ...r_transform_lower_device_storage_access_info.py | 118 +++++
 .../test_tir_transform_lower_thread_all_reduce.py  | 239 +++++++++
 .../test_tir_transform_lower_tvm_builtin.py        |   9 +-
 .../unittest/test_tir_transform_make_packed_api.py |  15 +-
 .../test_tir_transform_make_unpacked_api.py        |   6 +
 ...form_merge_dynamic_shared_memory_allocations.py | 133 ++++-
 .../unittest/test_tir_transform_remove_no_op.py    |  16 +
 .../test_tir_transform_split_host_device.py        |  51 ++
 .../unittest/test_tir_unsafe_hide_buffer_access.py | 102 ++++
 ...ransform_convert_pool_allocations_to_offsets.py | 255 +++++----
 tests/python/unittest/test_tvmscript_roundtrip.py  |  58 +++
 tests/scripts/task_config_build_arm.sh             |   2 +-
 tests/scripts/unity/task_python_relax.sh           |   1 +
 web/emcc/webgpu_runtime.cc                         |   4 -
 188 files changed, 7981 insertions(+), 2388 deletions(-)
 create mode 100644 include/tvm/tir/block_dependence_info.h
 copy {apps/uma/_template => python/tvm/dlight}/__init__.py (82%)
 copy python/tvm/{_ffi/_ctypes => dlight/base}/__init__.py (88%)
 create mode 100644 python/tvm/dlight/base/schedule_rule.py
 create mode 100644 python/tvm/dlight/base/transform.py
 copy {apps/uma/_template => python/tvm/dlight/gpu}/__init__.py (84%)
 create mode 100644 python/tvm/dlight/gpu/fallback.py
 create mode 100644 python/tvm/tir/block_dependence_info.py
 create mode 100644 python/tvm/topi/hexagon/compute_poolarea.py
 create mode 100644 src/relay/op/nn/pooling_common.h
 create mode 100644 src/relay/qnn/op/avg_pool2d.cc
 create mode 100644 src/runtime/contrib/cutlass/weight_preprocess.cc
 create mode 100644 src/tir/ir/block_dependence_info.cc
 create mode 100644 src/tir/schedule/primitive/hide_buffer_access.cc
 create mode 100644 tests/python/contrib/test_hexagon/test_qnn_op_integration.py
 delete mode 100644 
tests/python/contrib/test_hexagon/test_wo_qnn_canonicalization.py
 create mode 100644 tests/python/dlight/test_schedule_rule.py
 copy tests/python/unittest/{test_tir_schedule_block_scope.py => 
test_tir_block_dependence_info.py} (51%)
 create mode 100644 
tests/python/unittest/test_tir_transform_lower_device_storage_access_info.py
 create mode 100644 
tests/python/unittest/test_tir_transform_lower_thread_all_reduce.py
 create mode 100644 tests/python/unittest/test_tir_unsafe_hide_buffer_access.py

Reply via email to