This is an automated email from the ASF dual-hosted git repository.
tqchen pushed a change to branch unity
in repository https://gitbox.apache.org/repos/asf/tvm.git
omit 1e1ff66fb3 [Unity][Dlight] Fix DecodeGeMV rule for spatial-inner with
grouping (#15340)
omit 63b170d80e [Unity] fp16 A x int B GEMM update - support int8, more
bias shape (#15318)
omit 66d3957d2c [Unity][Dlight] Rule matmul avoiding blockIdx.z (#15333)
omit cf401bc6b4 [Unity][Dlight] Fix decode-GeMV rule when spatial-inner
without broadcasting (#15330)
omit e2a6e8f793 [Unity][Training] Enhance gradient system (#15230)
omit 783b467845 [Unity] CUDA Graph update (#15320)
omit dc113955dc [Unity][OP] Add `rms_norm` (#15314)
add b47b2695fb [Unity] CUDA Graph update (#15320)
add 98ef29a852 [Unity][Training] Enhance gradient system (#15230)
add aa28859340 [Unity][Dlight] Fix decode-GeMV rule when spatial-inner
without broadcasting (#15330)
add 6294aada46 [Unity][Dlight] Rule matmul avoiding blockIdx.z (#15333)
add 231653cca5 [Unity] fp16 A x int B GEMM update - support int8, more
bias shape (#15318)
add 959b7e5e09 [Unity][Dlight] Fix DecodeGeMV rule for spatial-inner with
grouping (#15340)
add 0413ce3138 [BugFix] Fix function to read all file (#15225)
add 30d684216c [TIR] Call TVMBackendFreeWorkspace inside LetStmt (#15253)
add 3c23865559 [Testing] Return BenchmarkResult in local_run and rpc_run
(#15277)
add dc7125b31e [Hexagon] Propagate QNN Concat Quantization Params to
Inputs (#15258)
add 9f8fe3c503 [topi] Add `arm_cpu` specific pooling schedules (#14855)
add e4a120955b [RELAY] Fix bug in MergeCompilerRegions pass (#15211)
add 81d7f79f03 Revert "[topi] Add `arm_cpu` specific pooling schedules"
(#15286)
add 592b3583dc [Exec] Add a script to test GPU memory bandwidth (#15287)
add fddbec7079 [TIR] Implement TIR macros (#15260)
add 33232deefb [FRONTEND][TFLITE][BugFix] Fix variable typo in batchmatmul
converting func (#15259)
add 1234f88b60 [BugFix][Relay][GraphExecutor] Fix set_input_zero_copy()
precision bug (#15291)
add 02ffc91396 [RPC] Fix socket bind errno on corner case (#15292)
add fba10d7021 [Docker] tensorflow_aarch64 package upgrade (#15293)
add a0e7d3e0ae [COMMUNITY] Qingchao Shen -> Reviewer (#15307)
add b6502f4e27 Fix keras version problem (#15265)
add 7890cca929 [JVM] Fix the Maven pom.xml for OS X arm64 tvm4j build
(#15321)
add 9af8efcd2d [Fix][TIR] LowerThreadAllreduce with correct thread mask
(#15323)
add 38bc953516 [Package] Remove cutlass media/docs inside
cutlass_fpA_intB_gemm (#15328)
add e25b1ba70a [TIR] ThreadAllreduce warp-level primitive support with
multi-warp (#15327)
add a4b863a2ff [Misc][Release] Extend PR tags and Format PR hyper-links in
release report (#15298)
add 7ad71e622a [Docker] Update ci-cortexm docker image (#15310)
add 6c63e0db53 [ETHOSU][MicroNPU][Pass] Add a pass to replicate pads
(#14909)
add c4f10cd5e9 [Runtime] Device API to query L2 cache size (#15332)
add e2d6511161 [Bugfix][Frontend][Keras]Fix a corner case bug in softmax
converter of keras frontend (#15337)
add c0946e19cd [Runtime] Flush L2 cache in time eval (#15305)
add 4b183daa97 [skipci] Fix typo in docs/arch/index.rst (#15312)
add a13b56a945 [OP] Add `rms_norm` into TOPI (#15326)
add d81e8809b8 [AOT] Avoid call_extern() with incorrect argument count
(#15301)
add 2eca9f0270 [TIR] Return error code from kernels in SplitHostDevice
(#15241)
add d8f1ac4e87 Merge remote-tracking branch 'apache-upstream/main' into
unity
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (1e1ff66fb3)
\
N -- N -- N refs/heads/unity (d8f1ac4e87)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
No new revisions were added by this update.
Summary of changes:
CONTRIBUTORS.md | 1 +
.../template_project/microtvm_api_server.py | 9 +-
ci/jenkins/docker-images.ini | 2 +-
.../install/ubuntu_install_tensorflow_aarch64.sh | 2 +-
docs/arch/index.rst | 4 +-
include/tvm/relax/attrs/nn.h | 11 -
include/tvm/runtime/device_api.h | 3 +-
include/tvm/runtime/profiling.h | 4 +-
include/tvm/topi/nn/rms_norm.h | 22 +-
jvm/native/osx-x86_64/pom.xml | 2 +-
jvm/pom.xml | 2 +-
python/setup.py | 8 +
python/tvm/_ffi/runtime_ctypes.py | 18 +
python/tvm/contrib/hexagon/transform.py | 105 ++++-
python/tvm/exec/gpu_memory_bandwidth.py | 192 +++++++++
python/tvm/relax/op/nn/nn.py | 40 --
python/tvm/relax/transform/legalize_ops/nn.py | 11 -
python/tvm/relay/backend/contrib/ethosu/codegen.py | 89 +++-
python/tvm/relay/frontend/keras.py | 21 +-
python/tvm/relay/frontend/tflite.py | 12 +-
python/tvm/relay/op/contrib/ethosu.py | 4 +-
python/tvm/runtime/module.py | 5 +
python/tvm/script/parser/_core.py | 2 +-
python/tvm/script/parser/core/entry.py | 35 +-
python/tvm/script/parser/tir/__init__.py | 4 +-
python/tvm/script/parser/tir/entry.py | 99 ++++-
python/tvm/script/parser/tir/parser.py | 60 ++-
python/tvm/target/target.py | 4 +
python/tvm/testing/runner.py | 12 +-
python/tvm/tir/op.py | 2 +-
python/tvm/topi/nn/rms_norm.py | 11 +-
python/tvm/topi/testing/rms_norm_python.py | 11 +-
src/relax/op/nn/nn.cc | 59 ---
src/relax/op/nn/nn.h | 3 -
src/relay/backend/aot_executor_codegen.cc | 38 +-
src/relay/transforms/merge_compiler_regions.cc | 36 +-
src/runtime/crt/common/crt_runtime_api.c | 5 +-
src/runtime/cuda/cuda_device_api.cc | 6 +
.../graph_executor/debug/graph_executor_debug.cc | 2 +-
src/runtime/graph_executor/graph_executor.cc | 25 +-
src/runtime/graph_executor/graph_executor.h | 6 +-
src/runtime/metal/metal_device_api.mm | 2 +
src/runtime/opencl/opencl_device_api.cc | 7 +
src/runtime/profiling.cc | 16 +-
src/runtime/rocm/rocm_device_api.cc | 5 +
src/runtime/rpc/rpc_module.cc | 28 +-
src/runtime/vulkan/vulkan_device_api.cc | 3 +
src/support/socket.h | 16 +-
src/te/operation/cross_thread_reduction.cc | 13 +-
src/tir/transforms/lower_device_kernel_launch.cc | 41 +-
src/tir/transforms/lower_thread_allreduce.cc | 337 +++++++++------
src/tir/transforms/lower_tvm_builtin.cc | 54 ++-
src/tir/transforms/split_host_device.cc | 33 +-
src/topi/nn.cc | 2 +-
tests/python/contrib/test_ethosu/test_codegen.py | 63 +++
tests/python/contrib/test_ethosu/test_legalize.py | 132 +++++-
.../test_hexagon/test_relay_simplify_qnn_concat.py | 101 +++++
tests/python/frontend/keras/test_forward.py | 7 +
tests/python/frontend/tflite/test_forward.py | 18 +
.../python/relax/test_transform_legalize_ops_nn.py | 260 ------------
.../relay/test_pass_merge_compiler_regions.py | 62 +++
tests/python/topi/python/test_topi_rms_norm.py | 32 +-
tests/python/unittest/test_set_input_zero_copy.py | 137 +++++++
.../test_tir_transform_lower_thread_all_reduce.py | 451 +++++++++++++++++++++
.../test_tir_transform_lower_tvm_builtin.py | 23 +-
.../test_tir_transform_split_host_device.py | 38 ++
tests/python/unittest/test_tvmscript_parser_tir.py | 107 +++++
tests/scripts/release/README.md | 13 +-
tests/scripts/release/make_notes.py | 33 +-
web/emcc/tvmjs_support.cc | 2 +-
70 files changed, 2340 insertions(+), 683 deletions(-)
create mode 100644 python/tvm/exec/gpu_memory_bandwidth.py
create mode 100644
tests/python/contrib/test_hexagon/test_relay_simplify_qnn_concat.py
create mode 100644 tests/python/unittest/test_set_input_zero_copy.py