This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch unity
in repository https://gitbox.apache.org/repos/asf/tvm.git


    omit 8caeab93f8 [Unity] Pattern-based rewriting for dataflow block (#14446)
    omit a70768ea2d [Unity][Graph matching] Clean up undo stack for parent and 
child nodes properly (#14440)
    omit 43e0f66234 [Unity][Op][Docs] Update comment for `call_tir_dyn` (#14441)
    omit adf256df79 [Unity][Graph matching] Automatically add `used-by` 
constraints for `is_op` pattern (#14439)
    omit 0695cfe4e2 [Unity] Remove non-deterministic behavior from graph 
pattern matching  (#14417)
    omit 5d145fdbe2 [Unity] Minor updates to DataFlowBlockRewrite (#14431)
    omit aa49fc3c2b [Unity][Fix] Annotate TIR op pattern could have no stores. 
(#14420)
    omit 48d972a8b5 [Unity] Include constant shapes in the profiler result 
(#14428)
    omit 465994eb05 [Unity] Handle extern func calls in static memory planning 
(#14419)
    omit 51b29ef494 [Unity][Fix] Copy over module attrs in FuseTIR (#14418)
    omit 854f2e9bc1 [Unity][Hexagon] Enable Relax VM for Hexagon (#14415)
    omit 23146d651a [Unity][Op] Expose scale in `R.nn.attention` and add its 
legalize op (#14412)
    omit cd3e10786d [Unity] Fix getting shapes for cutlass BYOC kernels (#14411)
    omit 9a3ec23b66 [Unity][Op] Conv1d (#14388)
    omit d97c43b724 [Unity][QNN][Hexagon]Support Relax Constants in the QNN 
TOPI operations (#14386)
    omit f4d5964653 [Unity][Transform] Common Subexpression Elimination (#14361)
    omit d377f69656 [Unity][TVMScript] Fix Shape Var occurrence in Tensor 
annotation (#14404)
    omit eff9a0a214 [Unity][Op] Add stop_lift_params (#14368)
    omit 23803b7c1f [Unity] Support simple dynamic-shape-aware fusion (#14396)
    omit 517d0457d3 [Unity][Transform] SplitCallTIRByPattern and CUTLASS 
backend (#14274)
    omit cc03017fb6 [Unity] Add missing #include <array> (#14383)
    omit 559fac73d4 [Unity][VM] Add CUDA graph vm builtins (#14371)
    omit 4e46ad4e9c [Unity] Also include output dtype in simt MathInstruction 
(#14372)
    omit b8eb779ff3 [Unity][Fix] Allow scalar layout initialization (#14370)
    omit 043e4f1bd6 [Unity][TVMScript] Update GlobalVar `checked_type_` when 
`emit_te` (#14367)
    omit db96ee80e7 [Unity] Add More Ops For FX Translator (#14348)
    omit 04c6d7a45f [Unity][Fix] Infer Layout must support negative axes 
(#14365)
    omit ecdc68cc5e [Unity][Pass] Fix FuseOps error if there is no output of a 
given group (#14354)
    omit ee5a2c204d [Unity][WEB] Support async pipeline creation (#14362)
    omit 20730633a4 [Unity] Add support to append relay op attrs in translator 
(#14356)
    omit b78ead8d05 [Unity][Transform] Fix AMP tests (#14360)
    omit 675a22e080 [Unity][Transform] Introduce data-dependent operation of 
reshape and its constant folding (#14282)
    omit 922e18c9ee [Unity][Fix] Fix block memory plan to handle bool (#14357)
    omit c6cf403590 [Unity][Transform] AMP out_dtype=float16 testcases (#14358)
    omit 8b5d64af71 [Unity][BYOC] Check leaked intermediate variables in 
cutlass patterns (#14350)
    omit 4d57272e7f [Unity] Support model kwargs in dynamo_capture_subgraph 
(#14349)
    omit 18c19fb830 [Unity][Frontend] FX exp and strided_slice fix (#14338)
    omit 6c6985940c [Unity][BYOC] Update testcases to follow recent changes 
(#14339)
    omit 06fe80be71 [Unity] Remove Python interface of RemoveUnusedFunction 
(#14336)
    omit a0bd29917c [Unity][Pass] Reuse prior infra to implement more complete 
DCE (#14334)
    omit 9aae685bf6 [Unity][Op] Fix Strided Slice Shape Inference (#14324)
    omit 0b47f0bfe3 [Unity][Transform] DefaultSchedule pass (#14266)
    omit 1d35ef2135 [Unity][Lint] Fix cpplint casting (#14333)
    omit 795568e148 [Unity][Transform] Automatic Mixed Precision (#14242)
    omit 3ca23b3378 [Unity][Transform] Simple Dead Code Elimination (#14262)
    omit fdf86e4c3e [Unity][Transform] Automatic Layout Conversion (#14257)
    omit 5529830bf5 [Unity][TOPI] fp16 LayerNorm & GroupNorm (#14264)
    omit e2b1d93591 [Unity][Contrib] Introduce several features of cutlass 
profiler (#14275)
    omit 850e549b32 [Unity][Transform] Enhance RewriteDataflowReshape transform 
(#14265)
    omit 981a822bd3 [Unity][BYOC] Improve expressiveness of the pattern check 
function in FuseOpsByPattern (#14310)
    omit 10b834f887 [Unity][BYOC] Support matmul + residual block fusion in 
CUTLASS BYOC (#14317)
    omit f5ee09795f [Unity] Support pattern-based rewriting (#14312)
    omit 001f17814c [Unity][Web] WebGPU explicit max buffer size (#14321)
    omit b10498b51c [Unity][Op] Enable special dimension value 0 in reshape 
(#14311)
    omit 765375f187 [Unity][Pass] Add a pass to alter the TIR implementation of 
an operator (#14215)
    omit 15de2c2df7 [Unity][DEBUG] Add Instrument (#14302)
    omit c1783b83a7 [Unity][Op] Cumsum (#14297)
    omit c7f40bd1d9 [Unity] Fix StructInfo Infer for `vm.alloc_tensor` (#14283)
    omit 8c34de2d7f [Unity] Mark tests that need python3.8 compact.
    omit 581889aa6c [TVMScript][Unity] Improve PyLint Compatibility (#14276)
    omit e6f3db185a [Unity][ci] Use CPU-SMALL instances (#14256)
    omit 7f89e22406 [Unity] Introduce call_dps_packed (#14183)
    omit 72c9510ae4 [Unity] Consider target context for Relay to Relax 
conversion (#14269)
    omit 1456f99a26 [Unity][Frontend] Import `tanh` and fix `layer_norm` 
(#14247)
    omit 6ae1c52610 [Unity][BYOC] Add conv2d and residual block patterns for 
Relax cutlass BYOC (#14252)
    omit 24b8e7bef5 [Unity] Allow user defined func attrs in emit_te (#14255)
    omit f5b6ac8fb4 [Unity][Op] Add repeat, tile, conv2d_transpose, avg_pool2d 
(#14238)
    omit 9b757c9f39 [Unity][Op][Tweak] Improve `StructInfo` inference for 
`shape_of` (#14243)
    omit 4c90f052f6 [Unity][WEB] Improve ndarray cache (#14236)
    omit 81c38c5e1b [Unity][WEB] Update text prompts for syntactical 
correctness (#14237)
    omit b270be88fe [Unity][TVMScript] Fix prim_func lost issue in 
relax.emit_te (#14189)
    omit 198caa55d1 [Unity][TVMScript] Enable Context-Aware Parsing (#14234)
    omit f2804d15f7 [Unity][Bugfix] Do not include `PrimFunc`s in the 
dependency graph when checking for recursion (#14228)
    omit 13c8c673ba [Unity][Transform] SimplifyNormInference (#14221)
    omit 6b75a40036 [Unity] Improve implementation of FuseOps (#14229)
    omit afdf218125 [Unity] ensure memory.alloc_tensor/storage roundtrippable 
(#14226)
    omit 2531c7eaf5 [Unity][WEB] Simplify WebGPU Codegen per spec (#14225)
    omit 84c20b3abe [Unity][Transform] Memory plan across the IRModule (#14220)
    omit 6cb1fe7a94 [Unity][BYOC] Add dynamic shape support to CUTLASS matmul 
(#14216)
    omit 841f8a0c03 [Unity][Frontend] from_fx keeps parameters in order (#14214)
    omit 1ffc31777e [Unity][WEB] Improve webgpu codegen options to skip 
readonly (#14213)
    omit c4225052e9 [Unity][Frontend] FX translator supports unwrapping unit 
return tuple (#14212)
    omit 399d9daf71 [Unity][Frontend] Attach imported model weights, deprecate 
ImporterOutput (#14211)
    omit b7193cf056 [Unity] Introduce Default GPU Schedule Pass (#14182)
    omit 044080ff93 [Unity][Frontend] FX translator support torch.baddbmm 
(#14202)
    omit 3b7db40860 [Unity][TIR][Pass] ForceNarrowIndexToInt32 (#14203)
    omit fb6b1ea299 [Unity][Fix] FX translating dtype (#14201)
    omit ec6e26827b [Unity][Frontend] FX translator returning weights with 
`keep_params_as_input` (#14197)
    omit 439ec78118 [Unity][Frontend] FX translator supporting more ops (#14196)
    omit 6731783749 [Unity][Op] Legalize `round`, `floor`, `ceil`, `sign` 
(#14198)
    omit 50c1e7a147 [Unity][Op] Argmax and argmin (#14195)
    omit 1927d7d4aa [Unity][Op] Group normalization (#14194)
    omit cd88b0ab49 [Unity][Transform] LiftTransformParams handling multiple 
functions (#14192)
    omit 281cc206cc [Unity][WEBGPU] Codegen improvements and WebRuntime (#14187)
    omit 8a46c21e33 [Unity][OP] Add an operator for fused multi head attention 
(#14150)
    omit 1cc9bb014e [Unity][Analysis] Restore Python bindings for var analyses 
(#14180)
    omit be532f28f2 [Unity][Op] Full support of Relax op `power` (#14171)
    omit 1bbe881241 [Unity][BYOC] Add batch matmul support to Relax CUTLASS 
BYOC (#14166)
    omit c56b17f4f6 [Unity][Analysis] Analysis for detecting recursion in Relax 
(#14149)
    omit 7cad6ef8d8 [Unity] Add bind_constants option to FuseOpsByPattern 
(#14151)
    omit ef0f4481cf [Unity][BYOC] Use Relax legalize + CPU build for reference 
in tests (#14162)
    omit 1cdc3d336b [Unity][Analysis] Checking function return struct info in 
well-formed check (#14155)
    omit 7e96a3aeed [Unity][Pass] Support Symbolic Shape Deduction during 
BindParam (#14154)
    omit 82bfc57772 [Unity][Debugging] AST printer (#14152)
    omit 43c5f29813 [Unity][Pass] Enhance constant folding to fold relax ops by 
evaluating them. (#14146)
    omit 019ef59f2e [Unity][Legalize] Fix Scalar Constant Legalization (#14127)
    omit 3bdd8013c1 [Unity] Add callback to FuseOpsByPattern to check match 
result is accepted (#14109)
    omit a16021ace5 [Unity][BYOC] Assign group to unused bindings and ignroe 
PrimFunc (#14139)
    omit 1004bdf02a [Unity][TVMScript] emit_te sugar (#14123)
    omit 5939a6e8c8 [Unity][BYOC] Add transposed matmul support to Relax 
CUTLASS BYOC (#14128)
    omit fa47ee995f [Unity] Add Global info (#14132)
    omit 133b4acaeb [Unity][WEB] Relax vm on web runtime (#14131)
    omit cf9beab753 [Unity][BlockBuilder] Add `name_hint` argument for `emit` 
and `emit_output` (#14126)
    omit 4e5e81a1b8 [Unity][Fix] Fix bug in MergeCompositeFunctions (#14117)
    omit fbf56475d2 [Unity] Update tests again to adapt to latest TVMScript 
syntax (#14115)
    omit b168949441 [Unity][BYOC]Add relax backend pattern registry (#14106)
    omit a31c856de7 [Unity] Remove attributes of relax.print, assert and unique 
(#14101)
    omit 9df23f7c67 [Unity][Layout] Add layout transformation analysis for 
PrimFunc (#14066)
    omit 70c8debc7a [Unity] Relax Recursive function (#14092)
    omit 03799a50cb [Unity] Lower `shape_of` to a builtin (#14093)
    omit 0eff29a505 [Unity] Fix typo in the comment (#14096)
    omit f15b80a561 [Unity][Relax] Set Shape Function to Be Host Function 
(#14090)
    omit 99f6d67dd0 [Unity] Refactor Relax Build JIT UX (#14088)
    omit 22c7b75834 [Unity][Fix][Pass] FoldConstant with DCE in dataflow block 
(#14087)
    omit 5aecfe4121 [Unity][Analysis] TIR pattern kind analysis for 
multi-buffer write block (#14075)
    omit 3a0f4c5eca [Unity][Op] `log_softmax` and `cross_entropy_with_logits` 
(#14083)
    omit f3ee944a58 [Unity][BYOC] Add DNNL backend (#14082)
    omit fe7e0651ec [Unity][BYOC] Add CUTLASS backend (#14081)
    omit d5fa61fd46 [Unity] Add testcases for `expr_args_converter` (#14080)
    omit 4b3794c24a [Unity][Pass] Canonicalize Bindings (#14079)
    omit 6ba5cac678 [Unity][BYOC][Pass] RunCodegen and TensorRT  (#14078)
    omit 51b1ce1ec7 [Unity][Transform] Add LiftTransformParams pass (#14069)
    omit 466a004d6c [Unity][Frontend] Annotate number of non-static input of FX 
function (#14067)
    omit 0bd303c7c1 [Unity][BYOC] Add pass to merge composite functions to 
offload large subgraphs (#14062)
    omit 7be4441569 [Unity][Pass] Remove Unused Function (#14061)
    omit 828edeb5ea [Unity][Fix][Pass] Fix FuseOps for lack graph edges (#14058)
    omit ff7d4950e0 [Unity] Relax op: collapse sum (#14059)
    omit 591b800bfa [Unity][BYOC] Add pattern-based partitioning pass (#14054)
    omit 2bd1581596 [Unity][VM] Add per-op profiling support  (#14053)
    omit 81169f6576 [Unity][TVMScript] Overload `__neg__` for relax expr 
(#14045)
    omit ac5bf3a76a [Unity][Pass] FuseOps FuseTIR fixes (#14044)
    omit f428a4ae23 [Unity] Statement rewriter for DataflowBlock (#14043)
    omit 02cefd91a5 [Unity] Relax dataflow pattern language (matching) (#14041)
    omit d3494933fe [Unity] Update tests to adapt to latest TVMScript syntax 
(#14039)
    omit 837a557210 [Unity] Disallow inline prim_func in relax IR (#14040)
    omit 96a9b6e4d8 [Unity][Pass] Block-level static memory planning (#14038)
    omit 85477ac489 [Unity] Initial PyTorch Frontend (#14037)
    omit 130e362430 [Unity][Op] Add ShapeExpr Tests for Reshape Op (#14035)
    omit 814eb921c2 [Unity][Pass] Operator legalization (#14029)
    omit 5e2f2b9d43 [Unity][TVMScript] Move tir/relax import in script out of 
__init__.py (#14033)
    omit b5d2304029 [Unity][Pass] Wellformed Analysis (#14032)
    omit 251a062bf1 [Unity][BlockBuilder] CallTE convert PrimValue args  
(#14028)
    omit 0d91c33103 [Unity][Pass] Normalize Pass (#14031)
    omit 388941ad6b [Unity] Relay -> Relax translator  (#14026)
    omit 87659ea3ea [Unity][Pass][TuningAPI] Introduce TuningAPI and 
MetaSchedule pass (#14014)
    omit 63e2402358 [Unity][Pass] BindParams pass, FoldConstant pass (#14016)
    omit 0e2bb802bb [Unity][VM] Supporting "compiled" exec mode. (#14015)
    omit e78d523e74 [Unity][Pass] LambdaLift pass (#14012)
    omit 20adb37493 [Unity][Pass] Operator Fusion Passes (#14001)
    omit 62daae4457 [Unity] NestedMsg Support utility (#13995)
    omit 4ab73eabc3 [Unity] Relax op: manipulation (#13989)
    omit 7a8765d819 [Unity] Relax op: search (#13992)
    omit 9694c673bb [Unity] Relax op: linear algebra (#13988)
    omit 4dd591b800 [Unity] Relax op: creation (#13984)
    omit a96f2006a6 [Unity] Relax op: neural networks (#13993)
    omit a6a2e84ca9 [Unity] Relax op: statistical (#13991)
    omit 5385d6d635 [Unity] Relax op: arithmetic, comparison (#13983)
    omit 42409202db [Unity] Relax op: image (#13994)
    omit c7a57aecd6 [Unity] Relax op: set (#13990)
    omit c8a153314d [Unity] Relax op: datatype (#13986)
    omit de164d2524 [Unity] Relax op: index (#13987)
    omit 75eecf7dd9 [Unity][TVMScript] Use explicit `R.shape` in TVMScript 
(#13979)
    omit fadbb3f256 [Unity] e2e Relax minimum build flow (#13961)
    omit 2c158714cf [Unity] Relax VM shape lowering pass (#13956)
    omit ea6cc94c8d [Unity] Relax VM codegen (#13954)
    omit e38a3360f5 [Unity] Relax TVMScript Printer (#13944)
    omit 2001903486 [Unity] Relax TVMScript Parser. (#13932)
    omit d1ad4e6543 [Unity] Relax BlockBuilder and ExprMutator (#13926)
    omit 6915444b2d [Unity] Basic StructInfo Analysis and Expr construction 
(#13916)
    omit fb90fd1a46 [Unity][CI] Unity specific jenkins setup (do not upstream 
to main) (#13910)
    omit 4e659d1f26 [Unity][IR] First-class StructInfo (#13907)
    omit 2c7f480f4f [Unity] Relax expressions and types (#13901)
    omit b8e4110467 [Unity] Relax VM (#13878)
     add 36b30974a9 [MetaSchedule] Introducing MemHammer (#14164)
     add 7f6da09052 [TIR] Fix Datatype in Lower TVM Builtin (#14347)
     add 4819300803 [CI][Lint] Update black (#14346)
     add 50b3ae4877 [TIR] [Analysis] Expose IsOutputBlock to python (#14352)
     add d4ca123afc [BugFix] Support rewrite_once when the number of callbacks 
> 1 (#14344)
     add 5abcf72147 [COMMUNITY] janetsc -> Reviewer (#14359)
     add 46fb2ff35f Hexagon compilation on MacOS system (#14308)
     add 0c2dd47286 [CI] Update GPU image for CUDA 11.7  (#14363)
     add c7970ddd79 [TensorIR] New schedule primitive `set_dtype` (#14316)
     add 91428158f2 [microTVM]Add MLPerfTiny test harness  (#14309)
     add 10a12bacb8 [CI][EZ] Upgrade CI Lint Image (#14373)
     add b56d7f56ab [TIR][Utility] More flexible tir::Substitute arguments 
(#14251)
     add 3b274aa6c7 [Hexagon] Allow scalar tensors to have null shape during 
allocation (#14376)
     add 3f56a95b87 [TVMScript] Use new variable frame in If/Then/Else (#14250)
     add e5ae4347dd [CUDA][Schedule] Better Layout Transform Schedules (#14167)
     add b987556375 [TIR] Remove LoadNode and StoreNode (#14381)
     add 67597025e7 [TVMScript][Fix] Fix `bool` printing for roundtrip (#14390)
     add ad6fbec066 [TIR] Improved error message in InjectSoftwarePipeline 
(#14391)
     add b09e72b54b [TIR] Legalize dtype of constants in IndexMap (#14385)
     add 4a2a3b5669 [TIR] Improved MakePackedAPI error message (#14387)
     add c5075dc30f [TIR] not estimating the flops when there is a default 
estimated flops as attr (#14379)
     add 0d0d2f0bd3 [CI][microTVM] Enable USE_MICRO for mac and windows CI 
builds (#14393)
     add 6c34361369 [Hexagon] Adapt some intrinsics for high vector lanes 
(#14345)
     add 6e70e79162 [microNPU] Upgrade Vela to v3.7.0 (#14374)
     add 30bf013e78 [TIR][Schedule] Add unittest for read_write_at (#14395)
     add da8335378a [TVMC][microNPU] tvmc option for printing which operators 
are offloaded to Ethos-U (#13212)
     add a0edf24c60 [TIR] Refactor BF16Legalize (#14405)
     add 14ddb37d14 [MetaSchedule][Hexagon] Improve vectorization for 
standalone elementwise op (#14408)
     add b3a5e18f6f [TVMScript] Improved error message for unexpected top frame 
(#14399)
     add 0ded2132e6 [skip ci] Replace magic_wand model with micro_speech 
(#14414)
     add 0e28541149 [microTVM] Update poetry to fix security issues (#14429)
     add 9f6ce7cbf9 [relay][frontend][pytorch]Fix a bug in the 
_get_pytorch_value_type function (#14421)
     add 5cca18bb07 [Frontend] Add ONNX importer for QLinearSoftmax (#14425)
     add 4011280b16 [OpenCL][Textures] Always use SSA for texture loading  
(#14397)
     add 79027f92ac [TIR] Remove special-casing of T.address_of in the storage 
rewrite pass (#14430)
     add fafe39ddab [Analysis] Improve error message in VerifyWellFormed 
(#14389)
     add 1d1dbebc73 [microTVM]Fix more security issues with pyproject (#14434)
     add cbe068cfac [TIR] Update LowerTVMBuiltin to use Optional<T> (#14400)
     add 8e2382eea5 [Bugfix] Conv3Dtranspose default kernel layout should be 
IODHW (#14340)
     add ffc1fc0116 [TVMC] Allow selecting a subset of tasks to be used in 
`tvmc tune` (#12525)
     add 7b34a6e0c6 [Runtime] Introduce runtime module property (#14406)
     add 776cf5b3b1 [Typo] Fix name of iter var type 4 (#14436)
     add 683e7a4555 [TOPI] Add instance_norm operator (#14410)
     add 221215bf60 [ETHOSN] Remove requantize dependency on resize (#14422)
     add 41fb9f41d4 [CL] Update Compute Library from v22.11 to v23.02.1 (#14426)
     add 70399da0a2 [TFLite] Support for BATCH_MATMUL tflite operator (#14423)
     add 7831a79f7f [Hexagon] Fix deprecated call for data layout size in bits 
(#14438)
     add b724c87f76 [MetaSchedule][ARM] Enable ARM CPU intrinsic for 
MetaSchedule (#14209)
     add 98007f90d8 [Relay] Move pad value extraction past null pointer check 
(#14445)
     add 49e6695586 [CI] Add llvm-15 and mlir-15 to Docker setup (#14303)
     add a27451755f [Unity] Relax VM (#13878)
     add 0117a28d22 [Unity] Relax expressions and types (#13901)
     add 2bb2e4bf75 [Unity][IR] First-class StructInfo (#13907)
     add f6b68ab7fd [Unity][CI] Unity specific jenkins setup (do not upstream 
to main) (#13910)
     add a7086616d7 [Unity] Basic StructInfo Analysis and Expr construction 
(#13916)
     add 23a7cd1a21 [Unity] Relax BlockBuilder and ExprMutator (#13926)
     add 63de0dacbd [Unity] Relax TVMScript Parser. (#13932)
     add a2d032494f [Unity] Relax TVMScript Printer (#13944)
     add 7f1e1f5528 [Unity] Relax VM codegen (#13954)
     add afe71010ef [Unity] Relax VM shape lowering pass (#13956)
     add dbedbb25ba [Unity] e2e Relax minimum build flow (#13961)
     add 4051a69cec [Unity][TVMScript] Use explicit `R.shape` in TVMScript 
(#13979)
     add caddedb418 [Unity] Relax op: index (#13987)
     add 4dfa36202b [Unity] Relax op: datatype (#13986)
     add 9a9e4a7823 [Unity] Relax op: set (#13990)
     add a9a561b472 [Unity] Relax op: image (#13994)
     add c534c9c7b3 [Unity] Relax op: arithmetic, comparison (#13983)
     add ec110c6023 [Unity] Relax op: statistical (#13991)
     add 5b3239ad4d [Unity] Relax op: neural networks (#13993)
     add 444d420450 [Unity] Relax op: creation (#13984)
     add bf6e2a9ef6 [Unity] Relax op: linear algebra (#13988)
     add 044f3bbc41 [Unity] Relax op: search (#13992)
     add f64e91c6da [Unity] Relax op: manipulation (#13989)
     add 26b4439cf1 [Unity] NestedMsg Support utility (#13995)
     add 18ade5f8ba [Unity][Pass] Operator Fusion Passes (#14001)
     add 7de9c82626 [Unity][Pass] LambdaLift pass (#14012)
     add 5a6579e1b0 [Unity][VM] Supporting "compiled" exec mode. (#14015)
     add f81e198ed4 [Unity][Pass] BindParams pass, FoldConstant pass (#14016)
     add 792d7c5eda [Unity][Pass][TuningAPI] Introduce TuningAPI and 
MetaSchedule pass (#14014)
     add 44b636f9be [Unity] Relay -> Relax translator  (#14026)
     add d8a6d1d826 [Unity][Pass] Normalize Pass (#14031)
     add 2cc122cd24 [Unity][BlockBuilder] CallTE convert PrimValue args  
(#14028)
     add a50cdd06e3 [Unity][Pass] Wellformed Analysis (#14032)
     add bd8fb78ac4 [Unity][TVMScript] Move tir/relax import in script out of 
__init__.py (#14033)
     add db588383bf [Unity][Pass] Operator legalization (#14029)
     add 317634bc19 [Unity][Op] Add ShapeExpr Tests for Reshape Op (#14035)
     add 8d575f2a73 [Unity] Initial PyTorch Frontend (#14037)
     add 9879fbbd0b [Unity][Pass] Block-level static memory planning (#14038)
     add db1bf6b039 [Unity] Disallow inline prim_func in relax IR (#14040)
     add c45b1a6990 [Unity] Update tests to adapt to latest TVMScript syntax 
(#14039)
     add 0525e05aaf [Unity] Relax dataflow pattern language (matching) (#14041)
     add 969047780a [Unity] Statement rewriter for DataflowBlock (#14043)
     add 80c474fbf1 [Unity][Pass] FuseOps FuseTIR fixes (#14044)
     add 8bad813c99 [Unity][TVMScript] Overload `__neg__` for relax expr 
(#14045)
     add b23e18c228 [Unity][VM] Add per-op profiling support  (#14053)
     add 9b1948d0ba [Unity][BYOC] Add pattern-based partitioning pass (#14054)
     add 3097f6648f [Unity] Relax op: collapse sum (#14059)
     add daa3184b29 [Unity][Fix][Pass] Fix FuseOps for lack graph edges (#14058)
     add 5f15d3a5fb [Unity][Pass] Remove Unused Function (#14061)
     add e6fdfc6075 [Unity][BYOC] Add pass to merge composite functions to 
offload large subgraphs (#14062)
     add 575fee9bb3 [Unity][Frontend] Annotate number of non-static input of FX 
function (#14067)
     add ac49e71881 [Unity][Transform] Add LiftTransformParams pass (#14069)
     add 183e4e1d84 [Unity][BYOC][Pass] RunCodegen and TensorRT  (#14078)
     add abdfe98d85 [Unity][Pass] Canonicalize Bindings (#14079)
     add 418eaf0b6b [Unity] Add testcases for `expr_args_converter` (#14080)
     add 1774d2229c [Unity][BYOC] Add CUTLASS backend (#14081)
     add 394f1261a5 [Unity][BYOC] Add DNNL backend (#14082)
     add cb7e29f7de [Unity][Op] `log_softmax` and `cross_entropy_with_logits` 
(#14083)
     add b5e6048361 [Unity][Analysis] TIR pattern kind analysis for 
multi-buffer write block (#14075)
     add fa0f49a6a7 [Unity][Fix][Pass] FoldConstant with DCE in dataflow block 
(#14087)
     add c728978f51 [Unity] Refactor Relax Build JIT UX (#14088)
     add 111dd1f6f5 [Unity][Relax] Set Shape Function to Be Host Function 
(#14090)
     add 3e139b0a93 [Unity] Fix typo in the comment (#14096)
     add 1ea40509c9 [Unity] Lower `shape_of` to a builtin (#14093)
     add 35331cdea2 [Unity] Relax Recursive function (#14092)
     add dd00671ae3 [Unity][Layout] Add layout transformation analysis for 
PrimFunc (#14066)
     add 7ac87251d0 [Unity] Remove attributes of relax.print, assert and unique 
(#14101)
     add c973eae56c [Unity][BYOC]Add relax backend pattern registry (#14106)
     add de2e70778e [Unity] Update tests again to adapt to latest TVMScript 
syntax (#14115)
     add 81a6438bc7 [Unity][Fix] Fix bug in MergeCompositeFunctions (#14117)
     add 631e483330 [Unity][BlockBuilder] Add `name_hint` argument for `emit` 
and `emit_output` (#14126)
     add 17d8625a73 [Unity][WEB] Relax vm on web runtime (#14131)
     add e85a1909db [Unity] Add Global info (#14132)
     add 8c1d87a46c [Unity][BYOC] Add transposed matmul support to Relax 
CUTLASS BYOC (#14128)
     add a5fbbd573f [Unity][TVMScript] emit_te sugar (#14123)
     add 5f5638c05a [Unity][BYOC] Assign group to unused bindings and ignroe 
PrimFunc (#14139)
     add 4892b763b9 [Unity] Add callback to FuseOpsByPattern to check match 
result is accepted (#14109)
     add 993c37d3c2 [Unity][Legalize] Fix Scalar Constant Legalization (#14127)
     add 016b2800a1 [Unity][Pass] Enhance constant folding to fold relax ops by 
evaluating them. (#14146)
     add 832c1ba04c [Unity][Debugging] AST printer (#14152)
     add 78af3acde3 [Unity][Pass] Support Symbolic Shape Deduction during 
BindParam (#14154)
     add 1d60a6a337 [Unity][Analysis] Checking function return struct info in 
well-formed check (#14155)
     add 96d85b2da5 [Unity][BYOC] Use Relax legalize + CPU build for reference 
in tests (#14162)
     add 70e925c8de [Unity] Add bind_constants option to FuseOpsByPattern 
(#14151)
     add 5f4a11a284 [Unity][Analysis] Analysis for detecting recursion in Relax 
(#14149)
     add d50be1cdf6 [Unity][BYOC] Add batch matmul support to Relax CUTLASS 
BYOC (#14166)
     add fb3e269c71 [Unity][Op] Full support of Relax op `power` (#14171)
     add 031e380c47 [Unity][Analysis] Restore Python bindings for var analyses 
(#14180)
     add 6c3a97c71c [Unity][OP] Add an operator for fused multi head attention 
(#14150)
     add 9ade1be9f7 [Unity][WEBGPU] Codegen improvements and WebRuntime (#14187)
     add d68bfb97ee [Unity][Transform] LiftTransformParams handling multiple 
functions (#14192)
     add 32049d825b [Unity][Op] Group normalization (#14194)
     add 694da73413 [Unity][Op] Argmax and argmin (#14195)
     add 012dacec71 [Unity][Op] Legalize `round`, `floor`, `ceil`, `sign` 
(#14198)
     add 5bafde482d [Unity][Frontend] FX translator supporting more ops (#14196)
     add 1896823417 [Unity][Frontend] FX translator returning weights with 
`keep_params_as_input` (#14197)
     add 2c75602cb4 [Unity][Fix] FX translating dtype (#14201)
     add 1978e44971 [Unity][TIR][Pass] ForceNarrowIndexToInt32 (#14203)
     add 03e413ae43 [Unity][Frontend] FX translator support torch.baddbmm 
(#14202)
     add f7ccc3bc59 [Unity] Introduce Default GPU Schedule Pass (#14182)
     add 4920cd26df [Unity][Frontend] Attach imported model weights, deprecate 
ImporterOutput (#14211)
     add 58e224f8b1 [Unity][Frontend] FX translator supports unwrapping unit 
return tuple (#14212)
     add 45a54f3a38 [Unity][WEB] Improve webgpu codegen options to skip 
readonly (#14213)
     add 7a4bdcde3c [Unity][Frontend] from_fx keeps parameters in order (#14214)
     add 6de29c50a2 [Unity][BYOC] Add dynamic shape support to CUTLASS matmul 
(#14216)
     add 4c39c31767 [Unity][Transform] Memory plan across the IRModule (#14220)
     add a3f40a7635 [Unity][WEB] Simplify WebGPU Codegen per spec (#14225)
     add a6d9601595 [Unity] ensure memory.alloc_tensor/storage roundtrippable 
(#14226)
     add 544b0821ae [Unity] Improve implementation of FuseOps (#14229)
     add 80fce8db81 [Unity][Transform] SimplifyNormInference (#14221)
     add 6ca3325a73 [Unity][Bugfix] Do not include `PrimFunc`s in the 
dependency graph when checking for recursion (#14228)
     add 2a32d64ef1 [Unity][TVMScript] Enable Context-Aware Parsing (#14234)
     add fba4b6bc50 [Unity][TVMScript] Fix prim_func lost issue in 
relax.emit_te (#14189)
     add 556b542611 [Unity][WEB] Update text prompts for syntactical 
correctness (#14237)
     add 3bddee1524 [Unity][WEB] Improve ndarray cache (#14236)
     add ac82cf8b0c [Unity][Op][Tweak] Improve `StructInfo` inference for 
`shape_of` (#14243)
     add 3cb9e263b9 [Unity][Op] Add repeat, tile, conv2d_transpose, avg_pool2d 
(#14238)
     add 77695deec6 [Unity] Allow user defined func attrs in emit_te (#14255)
     add 71899e5529 [Unity][BYOC] Add conv2d and residual block patterns for 
Relax cutlass BYOC (#14252)
     add 08e2a69efc [Unity][Frontend] Import `tanh` and fix `layer_norm` 
(#14247)
     add 96cd5b5b4e [Unity] Consider target context for Relay to Relax 
conversion (#14269)
     add d268b13cac [Unity] Introduce call_dps_packed (#14183)
     add 97b429a256 [Unity][ci] Use CPU-SMALL instances (#14256)
     add 2ce4af3e0c [TVMScript][Unity] Improve PyLint Compatibility (#14276)
     add df7f510da8 [Unity] Mark tests that need python3.8 compact.
     add d394b6a89f [Unity] Fix StructInfo Infer for `vm.alloc_tensor` (#14283)
     add 0f49776de3 [Unity][Op] Cumsum (#14297)
     add 1a582b9d79 [Unity][DEBUG] Add Instrument (#14302)
     add a9ca0cf0ab [Unity][Pass] Add a pass to alter the TIR implementation of 
an operator (#14215)
     add 0f6463fccb [Unity][Op] Enable special dimension value 0 in reshape 
(#14311)
     add e61576ba4b [Unity][Web] WebGPU explicit max buffer size (#14321)
     add 1a7244135f [Unity] Support pattern-based rewriting (#14312)
     add db7fdfd5fa [Unity][BYOC] Support matmul + residual block fusion in 
CUTLASS BYOC (#14317)
     add 0145fe97a4 [Unity][BYOC] Improve expressiveness of the pattern check 
function in FuseOpsByPattern (#14310)
     add 9cba9bfd7a [Unity][Transform] Enhance RewriteDataflowReshape transform 
(#14265)
     add 3e66b205d2 [Unity][Contrib] Introduce several features of cutlass 
profiler (#14275)
     add 30817d1aef [Unity][TOPI] fp16 LayerNorm & GroupNorm (#14264)
     add cdb435ccff [Unity][Transform] Automatic Layout Conversion (#14257)
     add 3b731b2eee [Unity][Transform] Simple Dead Code Elimination (#14262)
     add 3497cca0b5 [Unity][Transform] Automatic Mixed Precision (#14242)
     add 24e0fc7c69 [Unity][Lint] Fix cpplint casting (#14333)
     add aa1932492b [Unity][Transform] DefaultSchedule pass (#14266)
     add dd742a826a [Unity][Op] Fix Strided Slice Shape Inference (#14324)
     add ccb9074907 [Unity][Pass] Reuse prior infra to implement more complete 
DCE (#14334)
     add de8c12ab3c [Unity] Remove Python interface of RemoveUnusedFunction 
(#14336)
     add 029a5e8793 [Unity][BYOC] Update testcases to follow recent changes 
(#14339)
     add 602fd10694 [Unity][Frontend] FX exp and strided_slice fix (#14338)
     add d623140045 [Unity] Support model kwargs in dynamo_capture_subgraph 
(#14349)
     add a5d659099d [Unity][BYOC] Check leaked intermediate variables in 
cutlass patterns (#14350)
     add d108639bce [Unity][Transform] AMP out_dtype=float16 testcases (#14358)
     add 9b8e003d50 [Unity][Fix] Fix block memory plan to handle bool (#14357)
     add 3d7af30df7 [Unity][Transform] Introduce data-dependent operation of 
reshape and its constant folding (#14282)
     add fc8bbbd6b4 [Unity][Transform] Fix AMP tests (#14360)
     add 84dc90d76b [Unity] Add support to append relay op attrs in translator 
(#14356)
     add f38171b0cd [Unity][WEB] Support async pipeline creation (#14362)
     add 77496c33f3 [Unity][Pass] Fix FuseOps error if there is no output of a 
given group (#14354)
     add bc391d3429 [Unity][Fix] Infer Layout must support negative axes 
(#14365)
     add 5afb3ea5c5 [Unity] Add More Ops For FX Translator (#14348)
     add b5b8e206d6 [Unity][TVMScript] Update GlobalVar `checked_type_` when 
`emit_te` (#14367)
     add e32164a805 [Unity][Fix] Allow scalar layout initialization (#14370)
     add 0908a43466 [Unity] Also include output dtype in simt MathInstruction 
(#14372)
     add 5a2f1ba2c6 [Unity][VM] Add CUDA graph vm builtins (#14371)
     add 634cfad0dc [Unity] Add missing #include <array> (#14383)
     add 95b6f680b7 [Unity][Transform] SplitCallTIRByPattern and CUTLASS 
backend (#14274)
     add f6919620c1 [Unity] Support simple dynamic-shape-aware fusion (#14396)
     add 414514c1bf [Unity][Op] Add stop_lift_params (#14368)
     add 25608f40c6 [Unity][TVMScript] Fix Shape Var occurrence in Tensor 
annotation (#14404)
     add 219ed08e12 [Unity][Transform] Common Subexpression Elimination (#14361)
     add f45f11a9e5 [Unity][QNN][Hexagon]Support Relax Constants in the QNN 
TOPI operations (#14386)
     add c69c75407f [Unity][Op] Conv1d (#14388)
     add ef4057a433 [Unity] Fix getting shapes for cutlass BYOC kernels (#14411)
     add 30db3de0e7 [Unity][Op] Expose scale in `R.nn.attention` and add its 
legalize op (#14412)
     add d93eb5c091 [Unity][Hexagon] Enable Relax VM for Hexagon (#14415)
     add c5335d96f9 [Unity][Fix] Copy over module attrs in FuseTIR (#14418)
     add ab3299c054 [Unity] Handle extern func calls in static memory planning 
(#14419)
     add ac90c7af01 [Unity] Include constant shapes in the profiler result 
(#14428)
     add 784733a425 [Unity][Fix] Annotate TIR op pattern could have no stores. 
(#14420)
     add 9efc5b83a7 [Unity] Minor updates to DataFlowBlockRewrite (#14431)
     add dc7ba6c46c [Unity] Remove non-deterministic behavior from graph 
pattern matching  (#14417)
     add 41230a981f [Unity][Graph matching] Automatically add `used-by` 
constraints for `is_op` pattern (#14439)
     add cecc5c3ade [Unity][Op][Docs] Update comment for `call_tir_dyn` (#14441)
     add 646d50dc27 [Unity][Graph matching] Clean up undo stack for parent and 
child nodes properly (#14440)
     add a425bc7a39 [Unity] Pattern-based rewriting for dataflow block (#14446)

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (8caeab93f8)
            \
             N -- N -- N   refs/heads/unity (a425bc7a39)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 3rdparty/mlperftiny/api/submitter_implemented.h    |    2 +-
 CONTRIBUTORS.md                                    |    1 +
 apps/microtvm/cmsisnn/requirements.txt             |   10 +-
 apps/microtvm/ethosu/requirements.txt              |   10 +-
 apps/microtvm/poetry.lock                          | 3537 ++++++++++----------
 apps/microtvm/pyproject.toml                       |   56 +-
 ci/jenkins/docker-images.ini                       |    4 +-
 conda/recipe/bld.bat                               |    1 +
 conda/recipe/build.sh                              |    1 +
 docker/Dockerfile.ci_cpu                           |    3 +
 docker/Dockerfile.ci_lint                          |    2 +-
 .../ubuntu_download_arm_compute_lib_binaries.sh    |    2 +-
 docker/install/ubuntu_install_llvm_from_source.sh  |    1 +
 docker/install/ubuntu_install_vela.sh              |    2 +-
 gallery/how_to/work_with_microtvm/micro_ethosu.py  |    4 +-
 gallery/how_to/work_with_microtvm/micro_tvmc.sh    |   14 +-
 gallery/tutorial/tvmc_command_line_driver.py       |    5 +
 include/tvm/meta_schedule/schedule_rule.h          |    2 +
 include/tvm/relay/attrs/nn.h                       |   12 +-
 include/tvm/runtime/container/array.h              |   49 +
 include/tvm/runtime/module.h                       |   51 +-
 include/tvm/runtime/vm/executable.h                |    3 +
 include/tvm/runtime/vm/vm.h                        |    3 +
 include/tvm/tir/expr.h                             |   60 -
 include/tvm/tir/expr_functor.h                     |    4 -
 include/tvm/tir/schedule/schedule.h                |   17 +-
 include/tvm/tir/stmt.h                             |   97 +-
 include/tvm/tir/stmt_functor.h                     |  148 +-
 include/tvm/tir/transform.h                        |   16 +-
 include/tvm/topi/elemwise.h                        |    6 +-
 include/tvm/topi/nn/instance_norm.h                |   63 +
 include/tvm/topi/transform.h                       |   10 +-
 python/gen_requirements.py                         |    2 +-
 python/tvm/contrib/hexagon/build.py                |   65 +-
 python/tvm/contrib/hexagon/session.py              |    1 +
 python/tvm/contrib/hexagon/tools.py                |  198 ++
 python/tvm/driver/tvmc/autotuner.py                |  132 +-
 python/tvm/driver/tvmc/compiler.py                 |  173 +
 python/tvm/ir/json_compact.py                      |    2 -
 python/tvm/meta_schedule/schedule/cuda/__init__.py |    2 +
 .../schedule/cuda/layout_transform.py              |  583 ++++
 .../tvm/relay/analysis/operations_distribution.py  |  102 +
 .../tvm/relay/backend/contrib/ethosu/tir/passes.py |    2 +-
 .../backend/contrib/ethosu/tir_to_cs_translator.py |    1 -
 python/tvm/relay/frontend/keras.py                 |   11 +-
 python/tvm/relay/frontend/onnx.py                  |   21 +
 python/tvm/relay/frontend/pytorch.py               |    5 +-
 python/tvm/relay/frontend/tensorflow_ops.py        |    5 +-
 python/tvm/relay/frontend/tflite.py                |  147 +
 python/tvm/relay/op/_transform.py                  |    2 +-
 python/tvm/relay/op/contrib/ethosn.py              |    6 +-
 python/tvm/relay/op/nn/_nn.py                      |   45 +
 python/tvm/relay/op/nn/nn.py                       |    2 +-
 python/tvm/relay/op/strategy/cuda.py               |   11 +
 python/tvm/relay/op/strategy/generic.py            |   36 +-
 python/tvm/relay/transform/suffixes.py             |  105 +
 python/tvm/runtime/module.py                       |   48 +-
 python/tvm/script/ir_builder/tir/ir.py             |    2 -
 python/tvm/script/parser/tir/parser.py             |    6 +-
 python/tvm/tir/__init__.py                         |    3 +-
 python/tvm/tir/analysis/analysis.py                |    3 +-
 python/tvm/tir/expr.py                             |   32 +-
 python/tvm/tir/schedule/analysis.py                |   19 +
 python/tvm/tir/schedule/schedule.py                |  103 +-
 python/tvm/tir/stmt.py                             |   38 +-
 python/tvm/tir/tensor_intrin/arm_cpu.py            |   99 +-
 python/tvm/tir/transform/transform.py              |   56 +-
 python/tvm/topi/hexagon/tensor_intrin.py           |  309 +-
 python/tvm/topi/nn/__init__.py                     |    1 +
 python/tvm/topi/nn/conv3d_transpose.py             |   11 +-
 .../topi/nn/{layer_norm.py => instance_norm.py}    |   13 +-
 python/tvm/topi/testing/__init__.py                |    1 +
 ...ayer_norm_python.py => instance_norm_python.py} |    9 +-
 python/tvm/topi/transform.py                       |   17 +-
 src/contrib/hybrid/codegen_hybrid.cc               |    6 -
 src/contrib/hybrid/codegen_hybrid.h                |    2 -
 src/driver/driver_api.cc                           |    4 +-
 .../feature_extractor/per_store_feature.cc         |    1 +
 .../postproc/disallow_async_strided_mem_copy.cc    |    2 +-
 .../postproc/rewrite_parallel_vectorize_unroll.cc  |   81 +-
 src/meta_schedule/postproc/verify_gpu_code.cc      |    3 +-
 src/meta_schedule/schedule_rule/schedule_rule.cc   |   90 +
 .../space_generator/space_generator.cc             |   19 +
 src/relay/backend/aot_executor_codegen.cc          |    3 +
 src/relay/backend/build_module.cc                  |    3 +
 .../backend/contrib/cmsisnn/extract_constants.cc   |    1 +
 src/relay/backend/contrib/cmsisnn/fuse_pads.cc     |    3 +-
 .../backend/contrib/cmsisnn/generate_constants.cc  |   12 +-
 .../contrib/cmsisnn/scalar_to_tensor_constant.cc   |    5 +-
 src/relay/backend/contrib/ethosn/ethosn_api.cc     |   45 +-
 src/relay/backend/contrib/ethosu/source_module.cc  |    3 +-
 src/relay/backend/graph_executor_codegen.cc        |    3 +
 src/relay/backend/vm/compiler.h                    |    3 +
 src/relay/ir/dataflow_matcher.cc                   |   37 +-
 src/relay/op/nn/convolution.cc                     |   18 +-
 src/relay/printer/model_library_format_printer.cc  |    3 +
 src/relay/printer/text_printer.h                   |    2 -
 src/relay/printer/tir_text_printer.cc              |   19 -
 src/relay/printer/tvmscript_printer.cc             |   26 -
 src/relay/transforms/annotate_target.cc            |    1 +
 src/relay/transforms/fold_explicit_padding.cc      |    2 +-
 src/runtime/aot_executor/aot_executor.h            |    3 +
 src/runtime/aot_executor/aot_executor_factory.h    |    3 +
 src/runtime/const_loader_module.cc                 |    3 +
 src/runtime/contrib/coreml/coreml_runtime.h        |    5 +
 src/runtime/contrib/dnnl/dnnl_json_runtime.cc      |    7 +-
 src/runtime/contrib/ethosn/ethosn_runtime.h        |    5 +
 src/runtime/contrib/json/json_runtime.h            |    5 +
 src/runtime/contrib/libtorch/libtorch_runtime.cc   |    4 +
 src/runtime/contrib/onnx/onnx_module.cc            |    3 +
 src/runtime/contrib/tensorrt/tensorrt_runtime.cc   |    5 +
 src/runtime/contrib/tflite/tflite_runtime.h        |    3 +
 src/runtime/contrib/vitis_ai/vitis_ai_runtime.h    |    5 +
 src/runtime/cuda/cuda_module.cc                    |    5 +
 src/runtime/graph_executor/graph_executor.h        |    3 +
 .../graph_executor/graph_executor_factory.h        |    3 +
 src/runtime/hexagon/hexagon_device_api.cc          |    2 +-
 src/runtime/hexagon/hexagon_module.h               |    4 +
 src/runtime/library_module.cc                      |    5 +
 src/runtime/metadata.cc                            |    3 +
 src/runtime/module.cc                              |    6 +-
 src/runtime/opencl/opencl_common.h                 |    5 +
 src/runtime/rpc/rpc_module.cc                      |    2 +
 src/runtime/static_library.cc                      |    4 +-
 src/script/ir_builder/tir/utils.h                  |   26 +-
 src/script/printer/legacy_repr.cc                  |   27 -
 src/script/printer/tir/expr.cc                     |    6 -
 src/script/printer/tir/ir.cc                       |    3 +-
 src/script/printer/tir/stmt.cc                     |    7 -
 src/target/codegen.cc                              |    1 -
 src/target/llvm/codegen_hexagon.cc                 |    4 +-
 src/target/llvm/codegen_llvm.cc                    |   12 +-
 src/target/llvm/codegen_llvm.h                     |    2 -
 src/target/llvm/llvm_module.cc                     |    8 +-
 src/target/source/codegen_c.cc                     |    8 -
 src/target/source/codegen_c.h                      |    2 -
 src/target/source/codegen_opencl.cc                |   52 +-
 src/target/source/codegen_opencl.h                 |    6 -
 src/target/source/codegen_webgpu.cc                |    2 +
 src/target/source/source_module.cc                 |    6 +-
 src/target/stackvm/codegen_stackvm.cc              |    8 -
 src/target/stackvm/codegen_stackvm.h               |    2 -
 src/target/target_kind.cc                          |    8 +-
 src/te/autodiff/jacobian.cc                        |    1 -
 src/te/operation/create_primfunc.cc                |    2 +-
 src/te/operation/cross_thread_reduction.cc         |    1 +
 src/te/operation/hybrid_op.cc                      |    4 +-
 src/te/operation/op_utils.cc                       |   16 -
 src/te/operation/op_utils.h                        |   16 -
 src/tir/analysis/block_access_region_detector.cc   |   10 -
 src/tir/analysis/buffer_access_lca_detector.cc     |    9 -
 src/tir/analysis/device_constraint_utils.cc        |   18 -
 src/tir/analysis/estimate_flops.cc                 |   11 +-
 src/tir/analysis/side_effect.cc                    |    5 -
 src/tir/analysis/var_touch.cc                      |    8 -
 src/tir/analysis/var_use_def_analysis.cc           |    8 -
 src/tir/analysis/var_use_def_analysis.h            |    4 -
 src/tir/analysis/verify_gpu_code.cc                |    8 -
 src/tir/analysis/verify_memory.cc                  |    8 -
 src/tir/analysis/verify_well_formed.cc             |   35 +-
 src/tir/ir/expr.cc                                 |   70 +-
 src/tir/ir/expr_functor.cc                         |    8 -
 src/tir/ir/index_map.cc                            |    2 +-
 src/tir/ir/stmt.cc                                 |   53 -
 src/tir/ir/stmt_functor.cc                         |   27 -
 src/tir/op/op.cc                                   |    2 +
 src/tir/schedule/analysis/analysis.cc              |    5 +
 src/tir/schedule/analysis/reducer.cc               |   18 -
 src/tir/schedule/concrete_schedule.cc              |   32 +
 src/tir/schedule/concrete_schedule.h               |    6 +
 src/tir/schedule/primitive.h                       |   21 +
 src/tir/schedule/primitive/block_annotate.cc       |  117 +
 src/tir/schedule/primitive/blockize_tensorize.cc   |    2 +-
 src/tir/schedule/primitive/cache_index.cc          |    8 +-
 src/tir/schedule/primitive/cache_read_write.cc     |   16 +-
 src/tir/schedule/primitive/compute_inline.cc       |    8 -
 .../schedule/primitive/layout_transformation.cc    |   36 +-
 src/tir/schedule/primitive/read_write_at.cc        |  421 +++
 src/tir/schedule/primitive/reduction.cc            |    8 +-
 src/tir/schedule/schedule.cc                       |    6 +
 src/tir/schedule/traced_schedule.cc                |   39 +
 src/tir/schedule/traced_schedule.h                 |    6 +
 src/tir/schedule/transform.cc                      |   10 +
 src/tir/schedule/transform.h                       |   12 +-
 src/tir/transforms/arg_binder.cc                   |    2 +-
 src/tir/transforms/bf16_legalize.cc                |  696 ++--
 src/tir/transforms/bound_checker.cc                |    8 -
 src/tir/transforms/common_subexpr_elim.cc          |    5 +-
 src/tir/transforms/compact_buffer_region.cc        |    8 -
 src/tir/transforms/coproc_sync.cc                  |    6 -
 src/tir/transforms/inject_copy_intrin.cc           |    2 +-
 src/tir/transforms/inject_double_buffer.cc         |    8 -
 src/tir/transforms/inject_software_pipeline.cc     |   16 +-
 src/tir/transforms/inject_virtual_thread.cc        |   18 +-
 src/tir/transforms/install_debug_spans.h           |    1 -
 src/tir/transforms/ir_utils.cc                     |    8 -
 src/tir/transforms/lower_cross_thread_reduction.cc |    2 +-
 src/tir/transforms/lower_custom_datatypes.cc       |    8 -
 src/tir/transforms/lower_match_buffer.cc           |   14 -
 src/tir/transforms/lower_thread_allreduce.cc       |    8 -
 src/tir/transforms/lower_tvm_builtin.cc            |  113 +-
 src/tir/transforms/lower_warp_memory.cc            |   12 -
 src/tir/transforms/make_packed_api.cc              |   10 +-
 .../manifest_shared_memory_local_stage.cc          |    2 +-
 src/tir/transforms/memhammer_coalesce.cc           |  234 ++
 src/tir/transforms/memhammer_intermediate_stage.cc |  444 +++
 src/tir/transforms/memhammer_lower_auto_copy.cc    |  779 +++++
 src/tir/transforms/memhammer_rewrite_rule.h        |  242 ++
 src/tir/transforms/memhammer_tensorcore_rewrite.cc |  350 ++
 .../merge_dynamic_shared_memory_allocations.cc     |   16 -
 src/tir/transforms/narrow_datatype.cc              |   10 +-
 src/tir/transforms/renew_defs.cc                   |    8 -
 src/tir/transforms/rewrite_unsafe_select.cc        |    3 -
 src/tir/transforms/simplify.cc                     |    4 -
 src/tir/transforms/split_host_device.cc            |    2 +-
 src/tir/transforms/storage_access.cc               |    8 -
 src/tir/transforms/storage_access.h                |    3 -
 src/tir/transforms/storage_flatten.cc              |   16 -
 src/tir/transforms/storage_rewrite.cc              |   55 +-
 src/tir/transforms/thread_storage_sync.cc          |    7 -
 src/tir/transforms/unroll_loop.cc                  |    4 -
 src/tir/transforms/update_pointer_storage_scope.cc |    8 -
 src/tir/transforms/update_pointer_storage_scope.h  |    2 -
 src/tir/transforms/vectorize_loop.cc               |   19 +-
 src/tir/usmp/analysis/extract_buffer_info.cc       |    7 +-
 src/tir/usmp/transform/create_io_allocates.cc      |    6 -
 src/topi/nn.cc                                     |    6 +
 src/topi/transform.cc                              |    2 +-
 .../hexagon/hexagon_device_api_tests.cc            |    3 +
 tests/lint/check_file_type.py                      |    1 +
 tests/lint/rat-excludes                            |    1 +
 tests/micro/arduino/test_utils.py                  |    2 +-
 tests/micro/common/test_autotune.py                |    2 +-
 tests/micro/common/test_mlperftiny.py              |  130 +-
 tests/micro/common/test_tvmc.py                    |    2 +-
 tests/python/contrib/test_dnnl.py                  |    2 +-
 tests/python/contrib/test_ethosn/test_resize.py    |   24 +-
 tests/python/contrib/test_ethosu/infra.py          |   11 +-
 .../test_pass_operations_distribution.py           |  173 +
 .../test_hexagon/test_fixed_point_multiply.py      |  138 +-
 tests/python/driver/tvmc/test_autotuner.py         |  102 +-
 tests/python/driver/tvmc/test_compiler.py          |  351 ++
 tests/python/frontend/onnx/test_forward.py         |   33 +
 tests/python/frontend/tensorflow/test_forward.py   |   12 +-
 tests/python/frontend/tflite/test_forward.py       |   74 +-
 tests/python/integration/test_reduce.py            |    4 +-
 .../relay/opencl_texture/test_injection_texture.py |   85 +
 tests/python/relay/test_dataflow_pattern.py        |   79 +-
 ...pi_layer_norm.py => test_topi_instance_norm.py} |   15 +-
 .../test_meta_schedule_post_order_apply.py         |   73 +
 ...e_postproc_rewrite_parallel_vectorize_unroll.py |   44 +
 .../test_meta_schedule_relay_integration.py        |    3 +
 ...meta_schedule_schedule_cuda_layout_transform.py |  466 +++
 .../unittest/test_micro_model_library_format.py    |    2 +-
 .../unittest/test_runtime_module_property.py       |   62 +
 tests/python/unittest/test_target_codegen_llvm.py  |    6 +-
 .../python/unittest/test_target_codegen_opencl.py  |    6 +-
 .../unittest/test_target_texture_codegen_opencl.py |  375 +++
 .../test_tir_analysis_estimate_tir_flops.py        |   30 +
 tests/python/unittest/test_tir_nodes.py            |    2 +-
 .../python/unittest/test_tir_schedule_analysis.py  |   21 +
 .../unittest/test_tir_schedule_compute_inline.py   |    2 +-
 .../unittest/test_tir_schedule_read_write_at.py    |  221 ++
 ...set_scope.py => test_tir_schedule_set_dtype.py} |   80 +-
 .../unittest/test_tir_schedule_transform_layout.py |   36 +
 .../unittest/test_tir_transform_bf16_legalize.py   |  257 +-
 .../test_tir_transform_lower_tvm_builtin.py        |   19 +-
 ...test_tir_transform_memhammer_lower_auto_copy.py | 1062 ++++++
 .../unittest/test_tir_transform_storage_rewrite.py |   61 +-
 .../unittest/test_tvmscript_ir_builder_tir.py      |    2 +-
 .../python/unittest/test_tvmscript_printer_tir.py  |    2 +-
 tests/python/unittest/test_tvmscript_roundtrip.py  |   16 +
 tests/scripts/request_hook/request_hook.py         |    3 +-
 273 files changed, 11989 insertions(+), 3690 deletions(-)
 create mode 100644 include/tvm/topi/nn/instance_norm.h
 create mode 100644 python/tvm/meta_schedule/schedule/cuda/layout_transform.py
 create mode 100644 python/tvm/relay/analysis/operations_distribution.py
 create mode 100644 python/tvm/relay/transform/suffixes.py
 copy python/tvm/topi/nn/{layer_norm.py => instance_norm.py} (77%)
 copy python/tvm/topi/testing/{layer_norm_python.py => instance_norm_python.py} 
(87%)
 create mode 100644 src/tir/schedule/primitive/read_write_at.cc
 create mode 100644 src/tir/transforms/memhammer_coalesce.cc
 create mode 100644 src/tir/transforms/memhammer_intermediate_stage.cc
 create mode 100644 src/tir/transforms/memhammer_lower_auto_copy.cc
 create mode 100644 src/tir/transforms/memhammer_rewrite_rule.h
 create mode 100644 src/tir/transforms/memhammer_tensorcore_rewrite.cc
 create mode 100644 
tests/python/contrib/test_ethosu/test_pass_operations_distribution.py
 create mode 100644 tests/python/relay/opencl_texture/test_injection_texture.py
 copy tests/python/topi/python/{test_topi_layer_norm.py => 
test_topi_instance_norm.py} (82%)
 create mode 100644 
tests/python/unittest/test_meta_schedule_schedule_cuda_layout_transform.py
 create mode 100644 tests/python/unittest/test_runtime_module_property.py
 create mode 100644 tests/python/unittest/test_tir_schedule_read_write_at.py
 copy tests/python/unittest/{test_tir_schedule_set_scope.py => 
test_tir_schedule_set_dtype.py} (59%)
 create mode 100644 
tests/python/unittest/test_tir_transform_memhammer_lower_auto_copy.py

Reply via email to