[GitHub] [tvm] tqchen edited a comment on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

GitBox Thu, 16 Sep 2021 13:33:07 -0700


tqchen edited a comment on issue #9022:
URL: https://github.com/apache/tvm/issues/9022#issuecomment-921220669



   Thanks @manupa-arm . I understand that proposal R4 can also work by having a 
pass to convert "global" to something more specialize as a pass (essentially R1 
and R4 are not that different except for different choices of scope names).
   
   The main question is what is the semantics around the scope "global". Each 
memory scope represent a "constraint" of what kind of memory it is. 
   
   Right now, when the device type is CPU,  "global" means any memory that can 
be accessed by the host cpu. This means the actual implement can come from 
include TVMBAW, memory from stack, or memory allocated by other means. While a 
memory allocated by TVMBAW can have other benefit(e.g. accessible by other 
devices because it is pinned), it is not the constraint specified by the 
"global" scope.
   
   We can of course further constraint the setting, to be say 
"global.workspace", that reduces the possible ways to allocate the memory, but 
still not preclude from choosing between multiple workspace buffers.
   
   So from semantics point of view. The pass can indeed choose to return 
"global" or rewrite to "global.stack" to ensure it is a stack allocation. But 
if the scope remains "global", we should not preclude the possibility for 
downstream from allocating from stack(the code generator should be able to 
choose any kind that satisfies the constraint). To say it in another way, we 
cannot say that "global" definitely mean no stack allocation.
   
   If the code needs to impose additional constraint that the memory must be 
accessible from a separate device(e.g. NPU), it certainly would require a more 
specialized constraint that is better spelled out explicitly. 
   
   As we can see that this is another kind of flexibility we want to enable 
here -- flexibility of picking possible backend allocation implementations 
without over constraining the code generator to a backend specific behavior 
that is platform dependent (like the case of pinned memory)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] tqchen edited a comment on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

Reply via email to