tqchen edited a comment on issue #9022:
URL: https://github.com/apache/tvm/issues/9022#issuecomment-920463412


   @mbs-octoml To give a bit of context
   
   In the context of CPU, we want to preserve small alloca until the code 
generation point. And then the code will generate the stack alloca in an 
explicit way. Only when memory is big enough(bigger than a constant), we will 
use an opaque allocation instead.
   
   Stack allocation is important for the performance of the CPU code. In the 
case of TVM, we do not have explicit concept of registers in most cases.  
Instead we need to rely on LLVM's mem2reg pass to transform a set of constant 
indexing into stack allocation and turn them into registers, so the code can 
run effectively. So removing this code path can complicate the code generator 
side optimization by quite a bit and slow down the CPU code.
   
   Of course this can be a target specific thing. LowerTVMBuiltin right now has 
the assumption to only run on host(CPU) code.
   
   - Allocate always prefers (native) stack allocation when possible, but also 
allows other means of opaque allocation(as long as the allocation is fulfilled)
   - There are however, cases when stack allocation is not possible 
       - When the size of memory requested is too big, stack alloca will 
explode the stack space(That is why there is a size check in the CPU case and 
the use of global opaque was meant as a fallback to avoid stackoverflow in 
models with big intermediate temp space)
       - LowerTVMBuiltin was originally designed to run on the host side, which 
means as soon as the allocation is about device side memory, it will need to 
call onto a (host side) device API to allocate the memory instead
   
   
   So rationales for the specific CPU side logic:
   - We want to have stack alloca on host when possible(to gain mem2reg 
optimization)
   - When the requested size is too large, we fallback to opaque workspace 
allocation on heap to allow the code to safely handle code with big temp memory 
requests as well as dynamic size allocation requests.
   
   My guess is we need to look into why VM cannot work with code that allocates 
on stack in the multiple target case


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to