This is an automated email from the ASF dual-hosted git repository.
tqchen pushed a change to branch unity-staging
in repository https://gitbox.apache.org/repos/asf/tvm.git
from 181e559008 [MERGE-FIX] Fix regresions after merge
add 14150f60d6 [Unity][BYOC] Add fused patterns for stacked attention
(#14608)
add 4ba82881f5 [Unity] Fix ForceNarrowI32 with pod arguments (#14605)
add 9ddd39be56 [Unity][UX] Symbolic Variables Used in Multiple Functions
(#14606)
add 3fd22484cf [Unity][Bugfix] Resolve failure on `test_e2e_op_dynamic.py`
(#14616)
add 5135461b53 [Unity][VM] Converting tuple arg to Python tuple (#14620)
add bcad5a9abe [Unity][VM] LibComparator using dtype from input (#14623)
add 89d539d746 [Unity] Update specific builtins for LM (#14617)
add 93945ce640 [Unity][CODEGEN] Fix metal codegen when with only single
working dim (#14627)
add d5725a8443 [Unity][CUTLASS] Support batched matmul + residual fusion
(#14613)
add 7b3fe3aa08 [Unity][CI] Update images to include jax deps (#14610)
add 6892b01c0b [Unity] hotfix webgpu codegen for vec load (#14630)
add 9daf9acf05 [Unity][Frontend] Some changes on the PyTorch FX Frontend
(#14625)
add b3e9adc1cb [Unity] update ci cpu/gpu images (#14631)
add 8569b9fe63 [Unity][AMP] Fix merging concrete type and "unknown" type
(#14612)
add 0f05116453 [Unity][MetaSchedule] Add the module_equality param for
tune_relax flow (#14537)
add d9d9172cf9 [Unity][TARGET] Updates vulkan codegen for DeclBuffer
(#14641)
add ff2f3c861a [Unity] `enable_warning` option for LegalizeOps and
MSApplyDatabase (#14634)
add b663c58c6b [Unity] BlockBuilder assigning unique tensor names in
call_te (#14632)
add 3138fc6abc [Unity] Improve error message in webgpu request (#14640)
add 81d6778947 [Unity][Frontend] Add `no_bind_return_tuple` for PyTorch FX
Translator (#14639)
add 9dd85dd703 Adding powerPreference argument to
navigator.gpu.requestAdapter (#14650)
add ccca0f5ecf [Unity][BYOC] Fuse attention pattern with `strided_slice`
(#14649)
add ee53619661 [Unity] Improve and reduces possible memory leak RPC debug
(#14662)
add f19e6835fe [Unity][BYOC] Add check for stacked attention patterns
(#14664)
add bb1e10c81f [Unity] Add rewriting for CUDA graph capturing (#14513)
add f7835a6f80 [Unity][CUTLASS] Require the residual input to have the
same shape as input (#14657)
add ec89242fbd [Unity] Update docs for operators (#14659)
add 814afe6b78 [Unity] Improve WebGPU codegen for large grid (#14674)
add 6c662eb631 [Unity] Use custom hash in `BlockBuilder` to avoid hashing
large constants (#14675)
add 981e0e8762 [Unity][Training] Optimizer library (#14670)
add b4fbac785b [Unity] Fix `DataflowReshapeRewrite` when input has
multiple buffers from tuple (#14669)
add c91fde574c [WebGPU] This PR fixes the webgpu runtime when there is no
pod params (#14685)
add a8f3a22cdd [Unity][TuningAPI] Temporary patch for large models
(#14691)
add 4b59ee6e9d [Unity] FuseOps skipping PrimValues (#14687)
add bc1597ce2b [Unity][CUTLASS] Fix CUTLASS codegen for occasional
variable name conflict (#14692)
add efa8282aa6 [Unity] Use split rather than slice in
`CombineParallelMatmul` (#14688)
add 51fa5763aa [Unity][WebGPU] Move NDArrayCache Support to relax runtime
(#14689)
add 167bc874e3 [Unity][Training] Loss functions and AppendLoss pass
(#14668)
add ee6e26f2cb [Unity][Op] Avoid indices in TIR matmul being 0 in
legalization (#14701)
add 19e82ecb3a [Unity] MetaScheduleApplyDatabase using workload from
records (#14702)
add 815422cfc0 [microNPU] Add support for MEAN with uint8 ifm (#14353)
add 9e5055b358 [TIR] [Docs] Fix unsafe_set_dtype docstring (#14611)
add 68ce1e871c [AOT] Fix warning on dropping const in
TVMAotExecutor_GetInputName (#14529)
add b48fcaba22 [TIR][Hexagon] Use the "target" value in T.func_attr for
VTCM limit (#14567)
add a9f572efae fix: deploy ci (#14607)
add 670d128f6d [COMMUNITY] Sunghyun Park -> Reviewer (#14622)
add 62f9b1d29a [Tensorflow] Fix conv2d_transpose for NHWC layout (#14546)
add e3638e772d [TensorIR][Doc] Docstring of `reorder_block_iter_var`
(#14504)
add a6f6f11000 [TensorIR][Bugfix] `reindex_cache_write` do not mutate init
statement (#14626)
add 62bffbb4cd [CODEGEN] Fix metal codegen when with only single working
dim (#14628)
add 0d51fbbecd Fix bug about wrong attribute name (#14636)
add e86a470ce0 [Relay] Enhance type infer for dynamic shape (#14601)
add 87146565b4 [MetaSchedule] PostProc not rewriting unroll for purely
spatial block (#14642)
add 608d35717a [test][script] Fix release gather_pr.py of script about
ghost users or blank PR nodes (#14646)
add 8e4cedc4e3 [Docker] Support rootless docker when using docker/bash.sh
(#14590)
add fc7ca1f503 [tests][scripts][release] Optimize release note script
about categories etc (#14653)
add b2a7bb9ee4 [MetaSchedule] Handle output cases for
InlineConstantScalars (#14654)
add 34342ba20f [METAL] Fix flaky memory issue due to racing (#14671)
add 7c97c4eb8d [Runtime] Fix Can't "query_imports" Bug of VM Executable
(#14656)
add a8e3d75ec6 [Community] Qiang Zhang -> Reviewer (#14677)
add 2bb9698be9 [TIR] Flatten SeqStmt on construction (#14492)
add 495ddc57f4 [TFLite][Frontend] Support for quantized squared difference
(#14667)
add 41ae745023 [CI] Downgrade ci_cpu llvm version back to 11 (#14680)
add f4b53fb6e2 [Community] Jiajun Jiang -> Reviewer (#14676)
add af17cfeb4c [CMAKE] Update search pattern of config (#14686)
add d27837ac29 unify search path approach to various libs (#14694)
add 1d145f1121 [cherry-pick][ARITH][BUGFIX] Fix a bug of iter map
floormod(x,2) simplify (#14704)
new d6af4af23e [MERGE] Merge main into unity 2023-04-23
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
CMakeLists.txt | 8 +-
CONTRIBUTORS.md | 3 +
ci/jenkins/generated/docker_jenkinsfile.groovy | 6 +-
ci/jenkins/generated/gpu_jenkinsfile.groovy | 4 +-
ci/jenkins/templates/utils/macros.j2 | 2 +-
ci/jenkins/unity_jenkinsfile.groovy | 4 +-
cmake/utils/FindCUDA.cmake | 19 +-
docker/bash.sh | 27 +-
docker/dev_common.sh | 2 +
docs/contribute/release_process.rst | 2 +-
include/tvm/relax/transform.h | 11 +-
include/tvm/runtime/crt/aot_executor.h | 2 +-
include/tvm/tir/analysis.h | 6 +-
include/tvm/tir/data_type_rewriter.h | 9 +-
include/tvm/tir/stmt.h | 159 +++--
python/tvm/contrib/cutlass/attention_operation.py | 92 ++-
python/tvm/contrib/cutlass/build.py | 56 +-
python/tvm/contrib/cutlass/gemm_operation.py | 21 +-
python/tvm/contrib/cutlass/gen_tensor_op.py | 35 +-
python/tvm/contrib/tvmjs.py | 2 +-
python/tvm/exec/rpc_proxy.py | 2 +-
python/tvm/meta_schedule/relax_integration.py | 47 +-
python/tvm/relax/__init__.py | 1 +
python/tvm/relax/backend/contrib/cutlass.py | 83 ++-
python/tvm/relax/backend/patterns.py | 64 +-
python/tvm/relax/frontend/torch/fx_translator.py | 93 ++-
python/tvm/relax/op/_op_gradient.py | 6 +-
python/tvm/relax/op/grad/grad.py | 12 +-
python/tvm/relax/op/nn/nn.py | 19 +-
python/tvm/relax/pipeline.py | 71 +-
python/tvm/relax/testing/lib_comparator.py | 10 +-
.../{backend/contrib => training}/__init__.py | 7 +-
python/tvm/{arith => relax/training}/_ffi_api.py | 5 +-
python/tvm/relax/training/loss.py | 292 +++++++++
python/tvm/relax/training/optimizer.py | 713 +++++++++++++++++++++
python/tvm/relax/training/utils.py | 155 +++++
python/tvm/relax/transform/__init__.py | 1 +
.../tvm/relax/transform/legalize_ops/__init__.py | 2 +-
.../legalize_ops/{creation.py => create.py} | 0
.../relax/transform/legalize_ops/linear_algebra.py | 14 +-
.../relax/transform/legalize_ops/statistical.py | 4 +
python/tvm/relax/transform/transform.py | 34 +-
python/tvm/relax/utils.py | 7 +-
python/tvm/relax/vm_build.py | 4 +
.../tvm/relay/backend/contrib/ethosu/legalize.py | 62 +-
.../tvm/relay/backend/contrib/ethosu/op/pooling.py | 10 +-
.../tvm/relay/backend/contrib/ethosu/te/pooling.py | 6 +-
python/tvm/relay/frontend/onnx.py | 2 +-
python/tvm/relay/frontend/tensorflow_ops.py | 4 +-
python/tvm/relay/frontend/tflite.py | 15 +-
python/tvm/relay/op/contrib/ethosu.py | 42 +-
python/tvm/runtime/relax_vm.py | 4 +-
python/tvm/tir/schedule/schedule.py | 60 +-
src/auto_scheduler/feature.cc | 4 +-
src/driver/driver_api.cc | 11 +-
src/meta_schedule/module_equality.cc | 13 -
.../postproc/rewrite_parallel_vectorize_unroll.cc | 14 +-
src/meta_schedule/schedule_rule/auto_inline.cc | 5 +-
src/node/ndarray_hash_equal.h | 7 +
src/node/structural_hash.cc | 10 +
src/relax/analysis/well_formed.cc | 57 +-
src/relax/backend/contrib/cutlass/codegen.cc | 32 +-
src/relax/backend/task_extraction.cc | 78 ++-
src/relax/ir/block_builder.cc | 18 +-
src/relax/training/utils.cc | 225 +++++++
src/relax/training/utils.h | 60 ++
src/relax/transform/combine_parallel_matmul.cc | 19 +-
src/relax/transform/fuse_ops.cc | 46 +-
src/relax/transform/infer_amp_utils.cc | 6 +
src/relax/transform/legalize_ops.cc | 23 +-
src/relax/transform/meta_schedule.cc | 50 +-
src/relax/transform/rewrite_cuda_graph.cc | 512 +++++++++++++++
src/relax/transform/rewrite_dataflow_reshape.cc | 30 +-
src/relax/transform/utils.h | 62 ++
src/relax/utils.cc | 11 +-
src/relay/analysis/type_solver.cc | 21 +-
src/relay/analysis/type_solver.h | 1 +
src/relay/backend/aot/aot_lower_main.cc | 6 +-
src/relay/backend/aot_executor_codegen.cc | 6 +-
src/relay/op/contrib/ethosu/op_attrs.h | 5 +
src/relay/op/contrib/ethosu/pooling.cc | 19 +-
src/runtime/crt/aot_executor/aot_executor.c | 2 +-
src/runtime/metal/metal_device_api.mm | 8 +
src/runtime/relax_vm/builtin.cc | 2 +
.../{attention_kv_cache.cc => lm_support.cc} | 88 ++-
src/runtime/relax_vm/ndarray_cache_support.cc | 204 ++++++
src/runtime/vm/executable.cc | 3 +-
src/runtime/vulkan/vulkan_device.cc | 6 +-
src/script/ir_builder/tir/utils.h | 9 +-
src/support/ordered_set.h | 68 ++
src/support/ring_buffer.h | 3 +
src/target/source/codegen_metal.cc | 10 +-
src/target/source/codegen_metal.h | 1 +
src/target/source/codegen_webgpu.cc | 71 +-
src/target/spirv/codegen_spirv.cc | 2 +
src/target/spirv/codegen_spirv.h | 1 +
src/tir/analysis/calculate_allocated_memory.cc | 39 +-
src/tir/contrib/ethosu/passes.cc | 2 +-
src/tir/ir/data_type_rewriter.cc | 63 +-
src/tir/ir/stmt.cc | 19 +
src/tir/ir/stmt_functor.cc | 8 +-
src/tir/schedule/analysis/reducer.cc | 7 +-
src/tir/schedule/primitive/cache_read_write.cc | 2 -
src/tir/transforms/force_narrow_index_to_i32.cc | 10 +
src/tir/transforms/remove_no_op.cc | 25 -
tests/cpp/ir_functor_test.cc | 3 +-
.../cascader/test_ethosu_pooling_matcher.py | 1 +
tests/python/contrib/test_ethosu/infra.py | 2 +
tests/python/contrib/test_ethosu/test_codegen.py | 31 +-
.../contrib/test_ethosu/test_identity_optimizer.py | 14 +-
.../contrib/test_ethosu/test_layout_optimizer.py | 53 +-
tests/python/contrib/test_ethosu/test_legalize.py | 40 +-
.../contrib/test_ethosu/test_replace_pooling.py | 11 +-
.../contrib/test_ethosu/test_type_inference.py | 9 +-
tests/python/frontend/tensorflow/test_forward.py | 21 +-
tests/python/frontend/tflite/test_forward.py | 37 +-
tests/python/relax/test_analysis_well_formed.py | 14 +
tests/python/relax/test_blockbuilder_core.py | 18 +
tests/python/relax/test_codegen_cutlass.py | 149 ++++-
tests/python/relax/test_e2e_op_dynamic.py | 3 -
tests/python/relax/test_frontend_from_fx.py | 236 +++++++
.../relax/test_meta_schedule_relax_integration.py | 210 ++++++
tests/python/relax/test_pipeline.py | 19 +-
tests/python/relax/test_relay_translator.py | 16 +-
tests/python/relax/test_runtime_builtin.py | 24 +-
tests/python/relax/test_training_append_loss.py | 327 ++++++++++
tests/python/relax/test_training_loss.py | 212 ++++++
tests/python/relax/test_training_optimizer.py | 594 +++++++++++++++++
.../relax/test_training_optimizer_numeric.py | 176 +++++
.../test_transform_combine_parallel_matmul.py | 116 ++--
tests/python/relax/test_transform_fuse_ops.py | 24 +
.../relax/test_transform_legalize_ops_grad.py | 32 +-
..._transform_legalize_ops_index_linear_algebra.py | 35 +
...st_transform_legalize_ops_search_statistical.py | 11 +-
.../relax/test_transform_legalize_ops_unary.py | 2 +-
.../test_transform_meta_schedule_apply_database.py | 84 +++
.../relax/test_transform_meta_schedule_tuning.py | 35 +-
.../relax/test_transform_rewrite_cuda_graph.py | 228 +++++++
.../test_transform_rewrite_dataflow_reshape.py | 127 +++-
.../relax/test_transform_to_mixed_precision.py | 67 +-
tests/python/relax/test_utils.py | 15 +
tests/python/relax/test_vm_build.py | 19 +-
tests/python/relax/test_vm_instrument.py | 19 +-
tests/python/relay/aot/test_c_device_api.py | 4 +-
tests/python/relay/aot/test_crt_aot.py | 2 +-
tests/python/relay/test_type_infer.py | 8 +
...e_postproc_rewrite_parallel_vectorize_unroll.py | 69 +-
...test_meta_schedule_schedule_rule_auto_inline.py | 29 +-
.../unittest/test_tir_schedule_cache_read_write.py | 91 +++
.../unittest/test_tir_stmt_functor_ir_transform.py | 4 +-
...test_tir_transform_force_narrow_index_to_i32.py | 21 +
.../unittest/test_tir_transform_remove_no_op.py | 2 +-
.../unittest/test_tvmscript_printer_annotation.py | 6 +-
.../python/unittest/test_tvmscript_printer_tir.py | 4 +-
.../unittest/test_tvmscript_printer_underlining.py | 4 +-
tests/python/unittest/test_tvmscript_roundtrip.py | 53 +-
tests/scripts/release/README.md | 8 +-
tests/scripts/release/gather_prs.py | 45 +-
tests/scripts/release/make_notes.py | 148 ++++-
web/Makefile | 3 +-
web/emcc/wasm_runtime.cc | 82 +--
web/src/runtime.ts | 83 ++-
web/src/webgpu.ts | 172 +++--
web/tests/python/webgpu_rpc_test.py | 4 +-
164 files changed, 7192 insertions(+), 1009 deletions(-)
mode change 100644 => 100755 docker/dev_common.sh
copy python/tvm/relax/{backend/contrib => training}/__init__.py (86%)
copy python/tvm/{arith => relax/training}/_ffi_api.py (90%)
create mode 100644 python/tvm/relax/training/loss.py
create mode 100644 python/tvm/relax/training/optimizer.py
create mode 100644 python/tvm/relax/training/utils.py
rename python/tvm/relax/transform/legalize_ops/{creation.py => create.py}
(100%)
create mode 100644 src/relax/training/utils.cc
create mode 100644 src/relax/training/utils.h
create mode 100644 src/relax/transform/rewrite_cuda_graph.cc
rename src/runtime/relax_vm/{attention_kv_cache.cc => lm_support.cc} (66%)
create mode 100644 src/runtime/relax_vm/ndarray_cache_support.cc
create mode 100644 src/support/ordered_set.h
create mode 100644 tests/python/relax/test_meta_schedule_relax_integration.py
create mode 100644 tests/python/relax/test_training_append_loss.py
create mode 100644 tests/python/relax/test_training_loss.py
create mode 100644 tests/python/relax/test_training_optimizer.py
create mode 100644 tests/python/relax/test_training_optimizer_numeric.py
create mode 100644
tests/python/relax/test_transform_meta_schedule_apply_database.py
create mode 100644 tests/python/relax/test_transform_rewrite_cuda_graph.py