manupa-arm commented on a change in pull request #9065:
URL: https://github.com/apache/tvm/pull/9065#discussion_r714157805



##########
File path: src/tir/transforms/lower_tvm_builtin.cc
##########
@@ -115,7 +115,8 @@ class BuiltinLower : public StmtExprMutator {
     int64_t nbytes = GetVectorBytes(op->dtype);
     if (device_type_.defined()) {

Review comment:
       Done

##########
File path: src/tir/transforms/storage_rewrite.cc
##########
@@ -478,6 +478,10 @@ class StoragePlanRewriter : public StmtExprMutator {
     uint64_t bits_offset{0};
   };
 
+  bool IsSpecialTaggedMemory(const StorageScope& scope) {

Review comment:
       Done

##########
File path: tests/python/relay/aot/test_crt_aot.py
##########
@@ -589,5 +590,41 @@ def test_memory_planning(workspace_byte_alignment, 
main_workspace_size, sum_work
     )
 
 
+def test_aot_codegen_backend_alloc_workspace_calls():
+    dtype = "float32"
+
+    # These shapes should create small tensors that would
+    # get lowered to stack allocations in the CPU PrimFuncs.
+    # However, the AoT executor codegen should retain them
+    # as TVMBAW calls
+    ishape = (1, 4, 4, 4)
+    wshape = (4, 4, 3, 3)
+
+    data0 = relay.var("data", shape=ishape, dtype=dtype)
+    weight0 = relay.var("weight", shape=wshape, dtype=dtype)
+    out = relay.nn.conv2d(data0, weight0, kernel_size=(3, 3), padding=(1, 1), 
groups=1)
+    main_f = relay.Function([data0, weight0], out)
+    mod = tvm.IRModule()
+    mod["main"] = main_f
+    mod = transform.InferType()(mod)
+
+    i_data = np.random.uniform(0, 1, ishape).astype(dtype)
+    w1_data = np.random.uniform(0, 1, wshape).astype(dtype)
+
+    inputs = OrderedDict([("data", i_data), ("weight", w1_data)])
+    output_list = generate_ref_data(mod, inputs)
+
+    compiled_runtime_modules = compile_models(

Review comment:
       Ah it is a bit cumbersome to do that :), Instead I used relay in 
primitive form so its clear that main function should only have three allocates.

##########
File path: tests/python/relay/aot/test_crt_aot.py
##########
@@ -589,5 +590,41 @@ def test_memory_planning(workspace_byte_alignment, 
main_workspace_size, sum_work
     )
 
 
+def test_aot_codegen_backend_alloc_workspace_calls():
+    dtype = "float32"
+
+    # These shapes should create small tensors that would
+    # get lowered to stack allocations in the CPU PrimFuncs.
+    # However, the AoT executor codegen should retain them
+    # as TVMBAW calls
+    ishape = (1, 4, 4, 4)
+    wshape = (4, 4, 3, 3)
+
+    data0 = relay.var("data", shape=ishape, dtype=dtype)
+    weight0 = relay.var("weight", shape=wshape, dtype=dtype)
+    out = relay.nn.conv2d(data0, weight0, kernel_size=(3, 3), padding=(1, 1), 
groups=1)
+    main_f = relay.Function([data0, weight0], out)
+    mod = tvm.IRModule()
+    mod["main"] = main_f
+    mod = transform.InferType()(mod)
+
+    i_data = np.random.uniform(0, 1, ishape).astype(dtype)
+    w1_data = np.random.uniform(0, 1, wshape).astype(dtype)
+
+    inputs = OrderedDict([("data", i_data), ("weight", w1_data)])
+    output_list = generate_ref_data(mod, inputs)
+
+    compiled_runtime_modules = compile_models(
+        AOTTestModel(module=mod, inputs=inputs, outputs=output_list),
+        "c",
+        True,
+    )
+
+    source = compiled_runtime_modules[0].lib.imported_modules[0].get_source()

Review comment:
       I can understand the reasoning but the current flow just creates per 
target IRModules just before runtime.Modules are created. Therefore all 
host_target (i.e. CPU) PrimFuncs end up in a single runtime.Module. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to