manupa-arm commented on a change in pull request #65:
URL: https://github.com/apache/tvm-rfcs/pull/65#discussion_r839354482



##########
File path: rfcs/0009_Unified_Static_Memory_Planning.md
##########
@@ -349,6 +349,108 @@ tvmc compile my_model.tflite --executor=aot 
--output-format=mlf --target=c
          TVMExecute(&my_model, &inputs, &outputs, &context);
     }
 ```
+
+## U4 : User wants to write/read directly to the workspace buffer
+
+This usecase allows the space used by I/O tensors to be re-used by the 
inference.
+
+### TVMC
+```
+    tvmc compile my_model.tflite 
+    --executor=aot 
+    --target=c
+    --workspace-pools=sram
+    --pass-config tir.usmp.enable=1
+    --pass-config tir.usmp.use_workspace_io=1
+
+```
+### Codegen'd Artifacts
+```
+    //Codegen'd artifacts in metadata.c (lib0.c)
+    
+    int32_t tvmgen_my_model_run(
+        tvmgen_my_model_workspace_pools* workspace_pools, 
+    ){
+         return my_model_main(workspace_pools.sram);
+    }
+
+    // Returns a handle pointing to space inside the
+    // workspace pool where input should be stored
+ 
+    tvmgen_my_model_inputs tvmgen_my_model_map_inputs(
+        tvmgen_my_model_workspace_pools* workspace_pools
+    ) {
+        tvmgen_my_model_inputs = {
+            .input0 = &workspace_pools->sram[<INPUT0_OFFSET>],
+        };
+        return tvmgen_my_model_inputs;
+    }
+ 
+    // Returns a handle pointing to space inside the
+    // workspace pool where output is stored
+    
+    tvmgen_my_model_outputs  tvmgen_my_model_map_outputs(
+        tvmgen_my_model_workspace_pools* workspace_pools
+    ) {
+        tvmgen_my_model_outputs = {
+            .output0 = &workspace_pools->sram[<OUTPUT0_OFFSET>],
+        };
+        return tvmgen_my_model_outputs;
+    }
+```
+```
+// metadata.h
+
+    #define TVM_MY_MODEL_SRAM_WORKSPACE_BUFFER_SIZE xxxx
+
+    typedef struct {
+       uint8_t* sram;
+    }  tvmgen_my_model_workspace_pools;
+
+    typedef struct {
+       uint8_t* input0;
+    }  tvmgen_my_model_inputs;
+
+    typedef struct {
+       uint8_t* output0;
+    }  tvmgen_my_model_outputs;
+
+    tvmgen_my_model_inputs tvmgen_my_model_map_inputs(
+        tvmgen_my_model_workspace_pools* workspace_pools
+    );
+
+    tvmgen_my_model_outputs  tvmgen_my_model_map_outputs(
+        tvmgen_my_model_workspace_pools* workspace_pools
+    );
+```
+### User Application
+```
+    // The User Application model;
+        __attribute__((section( "SRAM" ), aligned( 16 )))  static uint8_t 
workspace_buffer_sram[TVM_MY_MODEL_SRAM_WORKSPACE_BUFFER_SIZE];
+
+    int main(...) {
+         ...
+         tvmgen_my_model_workspace_pools workspaces = {
+             .sram = &workspace_buffer_sram,
+         };
+         tvmgen_my_model_inputs inputs = 
+         tvmgen_my_model_map_inputs(&workspaces);
+         tvmgen_my_model_outputs outputs = 
+         tvmgen_my_model_map_outputs(&workspaces);
+
+         // Generate input tensor by passing the handle
+         // E.g. this could be a driver writing directly to
+         // the workspace buffer
+         GenerateInput(inputs.input0)
+
+         tvmgen_my_model_run(&workspaces);

Review comment:
       Good points @areusch!
   
   One thing I would like to make clear, the proposal here is to extend the 
current TVM's status. Therefore, this change do not consider the following two 
scenarios : 
   
   A1. using tvm.relay.build(List[IRModule])
   A2. Selective re-use of space of some I/O tensors.
   
   Thus, I would consider them out of scope for extension proposed here. 
   
   However, I'd like to highlight how this work is unrelated or do not block 
such extentions in the future.
   
   For A1,
   
   If and when we want to do this (hopefully in that RFC) -- discuss why would 
we ever want to do this. As of now I feel this over-stretching that API. Even 
if we wanted do that, there is a larger issue to be solved because in the case 
where PassConfig need to be seperately set for the different programs that 
coupled in the build call. e.g. we might want not to vectorize one program 
while we want to do vectorize the other.
   
   For A2, 
   
   This is a perfectly valid extension upon this work.
   
   Right now, we have kept the change self-contained within USMP Pass -- 
therefore using PassConfig.
   I would still consider this route to be valueable because the user does not 
need to manually annotate all the I/O tensors otherwise, which would account 
for most usage of this feature.
   
   One thing that I find challenging would be to use these annotations, before 
relay.build(...) following the agreement reached here : 
https://github.com/apache/tvm-rfcs/blob/main/rfcs/0029-migrating-to-irmodule-attributes.md.
 i.e. the annotations are being done inside relay.build(...). 
   
   Let say if we could overcome that hurdle, then what needs to be done to 
support that would be just use the annotation to filter CreateAllocateIO pass 
to select which I/O tensors should be selected based on the annotations. Thus, 
one could simply extend the feature proposed here to support that derivative 
advanced feature.
   
   I can add some text as future considerations, if you like. lmk.
   
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to