[GitHub] [tvm] ganler commented on a change in pull request #8285: [VM][PooledAllocator] try reallocation once when OOM

GitBox Sat, 19 Jun 2021 20:52:48 -0700


ganler commented on a change in pull request #8285:
URL: https://github.com/apache/tvm/pull/8285#discussion_r654871849




##########
File path: src/runtime/vm/pooled_allocator.h
##########
@@ -57,14 +57,22 @@ class PooledAllocator final : public Allocator {
     Buffer buf;
     buf.device = device_;
     buf.size = size;
-    buf.data = DeviceAPI::Get(device_)->AllocDataSpace(device_, size, 
alignment, type_hint);
+    try {
+      buf.data = DeviceAPI::Get(device_)->AllocDataSpace(device_, size, 
alignment, type_hint);
+    } catch (InternalError& err) {
+      LOG(WARNING) << "PooledAllocator got InternalError during allocation: " 
<< err.message();
+      LOG(WARNING) << "Trying to release all unused memory and reallocate...";
+      ReleaseAll();
+      buf.data = DeviceAPI::Get(device_)->AllocDataSpace(device_, size, 
alignment, type_hint);

Review comment:
       Thanks for the suggestion. But IMHO this is not robust enough.
   
   Say that we have 8 GB GPU memory, the pooled cached 4 GB and we want to 
allocate 6 GB. 
   
   - Applying your idea, `ReleaseAll()` returns "4GB" which is less than "6GB", 
thus resulting in a failed allocation.
   - Instead, if we release unused memory and do re-allocation, "6GB" is very 
likely to be successfully allocated.
   
   But the big picture behind your idea is practical if we can have some APIs 
like "total_system_memory" and "available_system_memory", which may require 
introducing a series of system driver libraries. e.g., `cudaMemGetInfo` by 
CudaRT (user space) or `NVML` (more system privilege is allowed).
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] ganler commented on a change in pull request #8285: [VM][PooledAllocator] try reallocation once when OOM

Reply via email to