junrushao commented on PR #16111:
URL: https://github.com/apache/tvm/pull/16111#issuecomment-1807395587
I would love to emphasize that, as a generic compiler infrastructure, it
usually does not assume a single usecase or depend on runtime behavior that,
for example, GCC cannot assume AVX-512 instructions always exist if not being
explicitly told.
> I feel the “over-allocation” is not going to be a severe issue.
There's definitely difference between personal feelings and objective
factors to be discussed in design choices, and I'd love to discuss specifically
on objective factors. Let me write a bit more about the examples I drew
previously in the thread, consider the case that we are calling a `main` method
which returns a dynamic shape buffer with actual length 1k, but stored in
pre-allocated 128k buffer, iterating for 1024 times:
```C++
std::vector<NDArray> outputs;
for (int i = 0; i < 1024; ++i) {
NDArray outputs = mod["main"](...); # size = 1k, but storage = 128k;
logits.push_back(result);
}
this->outputs = outputs;
```
It effectively means the outputs takes `1024 * 128k` in RAM instead of `1024
* 1k`.
> in the worst case, will cause the total memory in the pool allocator (when
enabled) to be `O(max_batch_size^2)` times of a single allocated storage
As alternatives, I'd love to suggest that upper-bound allocation is not the
only solution to de-fragmentation, and there's indeed well-practiced solutions,
for example, bucketing, which creates a bucket `i` of memory slots when
allocating memory within `(2 ^ (i - 1), 2 * i]`, as well as buddy allocator as
its straightforward generalization.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]