junrushao commented on PR #16111:
URL: https://github.com/apache/tvm/pull/16111#issuecomment-1807395587

   I would love to emphasize that, as a generic compiler infrastructure, it 
usually does not assume a single usecase or depend on runtime behavior that, 
for example, GCC cannot assume AVX-512 instructions always exist if not being 
explicitly told.
   
   > I feel the “over-allocation” is not going to be a severe issue.
   
   There's definitely difference between personal feelings and objective 
factors to be discussed in design choices, and I'd love to discuss specifically 
on objective factors. Let me write a bit more about the examples I drew 
previously in the thread, consider the case that we are calling a `main` method 
which returns a dynamic shape buffer with actual length 1k, but stored in 
pre-allocated 128k buffer, iterating for 1024 times:
   
   ```C++
   std::vector<NDArray> outputs;
   for (int i = 0; i < 1024; ++i) {
     NDArray outputs = mod["main"](...); # size = 1k, but storage = 128k;
     logits.push_back(result);
   }
   this->outputs = outputs;
   ```
   
   It effectively means the outputs takes `1024 * 128k` in RAM instead of `1024 
* 1k`.
   
   > in the worst case, will cause the total memory in the pool allocator (when 
enabled) to be `O(max_batch_size^2)` times of a single allocated storage
   
   As alternatives, I'd love to suggest that upper-bound allocation is not the 
only solution to de-fragmentation, and there's indeed well-practiced solutions, 
for example, bucketing, which creates a bucket `i` of memory slots when 
allocating memory within `(2 ^ (i - 1), 2 * i]`, as well as buddy allocator as 
its straightforward generalization.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to