areusch commented on a change in pull request #65: URL: https://github.com/apache/tvm-rfcs/pull/65#discussion_r839120089
########## File path: rfcs/0009_Unified_Static_Memory_Planning.md ########## @@ -515,4 +663,6 @@ NOTE : to support tir.constants generally, we'll be enhancing the bound relay.co # Drawbacks -* The relay "main" function that describes the call order to operator PrimFuncs has to be described in TIR to be able to integrate the USMP into the respective executor codegen. However, we dont view this as a major problem as the relay "main" function could easily be lowered to TIR. \ No newline at end of file +* The relay "main" function that describes the call order to operator PrimFuncs has to be described in TIR to be able to integrate the USMP into the respective executor codegen. However, we dont view this as a major problem as the relay "main" function could easily be lowered to TIR. + +* The U4 usecase will only be supported with [Embedded C Runtime Interface](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951/14). This is mainly because the nature of the requirement is associated with embedded usecases. However, the USMP changes here should be complimentary to support other runtime interfaces such as Module-based Model Runtime Interface's set_input and set_output in future. Review comment: if we did implement this for e.g. C++ runtime or a scenario where `DLTensor` were passed (perhaps we have dynamic shapes), would we allocate DLTensor instances in whatever section of workspace we reserve for them? conceptually we could just consider those `DLTensor`s as companion buffers which need to be live at the same time as the `data`. ########## File path: rfcs/0009_Unified_Static_Memory_Planning.md ########## @@ -349,6 +349,108 @@ tvmc compile my_model.tflite --executor=aot --output-format=mlf --target=c TVMExecute(&my_model, &inputs, &outputs, &context); } ``` + +## U4 : User wants to write/read directly to the workspace buffer + +This usecase allows the space used by I/O tensors to be re-used by the inference. + +### TVMC +``` + tvmc compile my_model.tflite + --executor=aot + --target=c + --workspace-pools=sram + --pass-config tir.usmp.enable=1 + --pass-config tir.usmp.use_workspace_io=1 + +``` +### Codegen'd Artifacts +``` + //Codegen'd artifacts in metadata.c (lib0.c) + + int32_t tvmgen_my_model_run( + tvmgen_my_model_workspace_pools* workspace_pools, + ){ + return my_model_main(workspace_pools.sram); + } + + // Returns a handle pointing to space inside the + // workspace pool where input should be stored + + tvmgen_my_model_inputs tvmgen_my_model_map_inputs( + tvmgen_my_model_workspace_pools* workspace_pools + ) { + tvmgen_my_model_inputs = { + .input0 = &workspace_pools->sram[<INPUT0_OFFSET>], + }; + return tvmgen_my_model_inputs; + } + + // Returns a handle pointing to space inside the + // workspace pool where output is stored + + tvmgen_my_model_outputs tvmgen_my_model_map_outputs( + tvmgen_my_model_workspace_pools* workspace_pools + ) { + tvmgen_my_model_outputs = { + .output0 = &workspace_pools->sram[<OUTPUT0_OFFSET>], + }; + return tvmgen_my_model_outputs; + } +``` +``` +// metadata.h + + #define TVM_MY_MODEL_SRAM_WORKSPACE_BUFFER_SIZE xxxx + + typedef struct { + uint8_t* sram; + } tvmgen_my_model_workspace_pools; + + typedef struct { + uint8_t* input0; + } tvmgen_my_model_inputs; + + typedef struct { + uint8_t* output0; + } tvmgen_my_model_outputs; + + tvmgen_my_model_inputs tvmgen_my_model_map_inputs( + tvmgen_my_model_workspace_pools* workspace_pools + ); + + tvmgen_my_model_outputs tvmgen_my_model_map_outputs( + tvmgen_my_model_workspace_pools* workspace_pools + ); +``` +### User Application +``` + // The User Application model; + __attribute__((section( "SRAM" ), aligned( 16 ))) static uint8_t workspace_buffer_sram[TVM_MY_MODEL_SRAM_WORKSPACE_BUFFER_SIZE]; + + int main(...) { + ... + tvmgen_my_model_workspace_pools workspaces = { + .sram = &workspace_buffer_sram, + }; + tvmgen_my_model_inputs inputs = + tvmgen_my_model_map_inputs(&workspaces); + tvmgen_my_model_outputs outputs = + tvmgen_my_model_map_outputs(&workspaces); + + // Generate input tensor by passing the handle + // E.g. this could be a driver writing directly to + // the workspace buffer + GenerateInput(inputs.input0) + + tvmgen_my_model_run(&workspaces); Review comment: right now we mark input nodes as "precious" (figuratively, this isn't a literal thing), and i don't think we re-use the memory. in other words, this line should be idempotent. I think this RFC seeks to change that, which is a perfectly reasonable thing to do but i like that there is a PassOption to support this explicitly. should this in fact be a PassOption? another way to do this is to annotate the relay program. the benefit of that is that if we ever started compiling multiple programs in a single tvm.relay.build call, we wouldn't have a singleton global PassOption which could apply differently to different programs or parameters. also, if a user didn't particularly care about one input, but did care about another, it might be more helpful to mark this at the I/O tensor level. what are your thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
