manupa-arm commented on a change in pull request #65:
URL: https://github.com/apache/tvm-rfcs/pull/65#discussion_r839354482



##########
File path: rfcs/0009_Unified_Static_Memory_Planning.md
##########
@@ -349,6 +349,108 @@ tvmc compile my_model.tflite --executor=aot 
--output-format=mlf --target=c
          TVMExecute(&my_model, &inputs, &outputs, &context);
     }
 ```
+
+## U4 : User wants to write/read directly to the workspace buffer
+
+This usecase allows the space used by I/O tensors to be re-used by the 
inference.
+
+### TVMC
+```
+    tvmc compile my_model.tflite 
+    --executor=aot 
+    --target=c
+    --workspace-pools=sram
+    --pass-config tir.usmp.enable=1
+    --pass-config tir.usmp.use_workspace_io=1
+
+```
+### Codegen'd Artifacts
+```
+    //Codegen'd artifacts in metadata.c (lib0.c)
+    
+    int32_t tvmgen_my_model_run(
+        tvmgen_my_model_workspace_pools* workspace_pools, 
+    ){
+         return my_model_main(workspace_pools.sram);
+    }
+
+    // Returns a handle pointing to space inside the
+    // workspace pool where input should be stored
+ 
+    tvmgen_my_model_inputs tvmgen_my_model_map_inputs(
+        tvmgen_my_model_workspace_pools* workspace_pools
+    ) {
+        tvmgen_my_model_inputs = {
+            .input0 = &workspace_pools->sram[<INPUT0_OFFSET>],
+        };
+        return tvmgen_my_model_inputs;
+    }
+ 
+    // Returns a handle pointing to space inside the
+    // workspace pool where output is stored
+    
+    tvmgen_my_model_outputs  tvmgen_my_model_map_outputs(
+        tvmgen_my_model_workspace_pools* workspace_pools
+    ) {
+        tvmgen_my_model_outputs = {
+            .output0 = &workspace_pools->sram[<OUTPUT0_OFFSET>],
+        };
+        return tvmgen_my_model_outputs;
+    }
+```
+```
+// metadata.h
+
+    #define TVM_MY_MODEL_SRAM_WORKSPACE_BUFFER_SIZE xxxx
+
+    typedef struct {
+       uint8_t* sram;
+    }  tvmgen_my_model_workspace_pools;
+
+    typedef struct {
+       uint8_t* input0;
+    }  tvmgen_my_model_inputs;
+
+    typedef struct {
+       uint8_t* output0;
+    }  tvmgen_my_model_outputs;
+
+    tvmgen_my_model_inputs tvmgen_my_model_map_inputs(
+        tvmgen_my_model_workspace_pools* workspace_pools
+    );
+
+    tvmgen_my_model_outputs  tvmgen_my_model_map_outputs(
+        tvmgen_my_model_workspace_pools* workspace_pools
+    );
+```
+### User Application
+```
+    // The User Application model;
+        __attribute__((section( "SRAM" ), aligned( 16 )))  static uint8_t 
workspace_buffer_sram[TVM_MY_MODEL_SRAM_WORKSPACE_BUFFER_SIZE];
+
+    int main(...) {
+         ...
+         tvmgen_my_model_workspace_pools workspaces = {
+             .sram = &workspace_buffer_sram,
+         };
+         tvmgen_my_model_inputs inputs = 
+         tvmgen_my_model_map_inputs(&workspaces);
+         tvmgen_my_model_outputs outputs = 
+         tvmgen_my_model_map_outputs(&workspaces);
+
+         // Generate input tensor by passing the handle
+         // E.g. this could be a driver writing directly to
+         // the workspace buffer
+         GenerateInput(inputs.input0)
+
+         tvmgen_my_model_run(&workspaces);

Review comment:
       Good points @areusch!
   
   One thing I would like to make clear, the proposal here is to extend the 
current TVM's status. Therefore, this change do not consider the following two 
scenarios : 
   
   A1. using tvm.relay.build(List[IRModule])
   A2. Selective re-use of space of some I/O tensors.
   
   Thus, I would consider them out of scope for extension proposed here. 
   
   However, I'd like to highlight how this work is unrelated or do not block 
such extentions in the future.
   
   For A1,
   
   If and when we want to do this (hopefully in that RFC) -- discuss why would 
we ever want to do this. As of now I feel this over-stretching that API. Even 
if we wanted do that, there is a larger issue to be solved because in the case 
where PassConfig need to be seperately set for the different programs that 
coupled in the build call. e.g. we might want not to vectorize one program 
while we want to do vectorize the other.
   
   For A2, 
   
   This is a perfectly valid extension upon this work.
   
   Right now, we have kept the change self-contained within USMP Pass -- 
therefore using PassConfig.
   I would still consider this route to be valueable because the user does not 
need to manually annotate all the I/O tensors otherwise, which would account 
for most usage of this feature.
   
   One thing that I find challenging would be to use these annotations, before 
relay.build(...) following the agreement reached here : 
https://github.com/apache/tvm-rfcs/blob/main/rfcs/0029-migrating-to-irmodule-attributes.md.
 i.e. the annotations are being done inside relay.build(...). 
   
   Let say if we could overcome that hurdle, then what needs to be done to 
support that would be just use the annotation to filter CreateAllocateIO pass 
to select which annotations should be selected. Thus, one could simply extend 
the feature proposed here to support that derivative advanced feature.
   
   I can add some text as future considerations, if you like. lmk.
   
   
   
   
   

##########
File path: rfcs/0009_Unified_Static_Memory_Planning.md
##########
@@ -515,4 +663,6 @@ NOTE : to support tir.constants generally, we'll be 
enhancing the bound relay.co
 
 # Drawbacks
 
-* The relay "main" function that describes the call order to operator 
PrimFuncs has to be described in TIR to be able to integrate the USMP into the 
respective executor codegen. However, we dont view this as a major problem as 
the relay "main" function could easily be lowered to TIR.
\ No newline at end of file
+* The relay "main" function that describes the call order to operator 
PrimFuncs has to be described in TIR to be able to integrate the USMP into the 
respective executor codegen. However, we dont view this as a major problem as 
the relay "main" function could easily be lowered to TIR.
+
+* The U4 usecase will only be supported with [Embedded C Runtime 
Interface](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951/14).
 This is mainly because the nature of the requirement is associated with 
embedded usecases. However, the USMP changes here should be complimentary to 
support other runtime interfaces such as Module-based Model Runtime Interface's 
set_input and set_output in future.

Review comment:
       Good point again!
   
   This is generally applicable for Embedded C Runtime Inferface as well.
   
   Another way to frame the the suggestion is to say : what cant we treat the 
buffer space used by I/O tensors as seperate workspace buffers ? 
   
   I have considered this too :).  The short anwser is it is sub-optimal use 
disjoint workspace buffers rather than a single workspace buffer -- because it 
does not allow partial overlaps of tensors across such buffers. TLDR; 
   
   So the workspace buffer is viewed as a spatio-temporal buffer and tensors 
are considered as spatio-temporal allocation. Having disjoint workspace 
buffers, create a spatial barrier that could potentially lead to higher 
fragmentation. Therefore, out of the two options; we are prioritizing the 
better one here. Again, this could be an another usecase if there is value for 
it.
   
   Again, I can add few lines for this as future consideration.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to