In the past 2 days I've been researching various ways to avoid expensive 
allocations/deallocations in tight loops on GPU, by reusing memory.

Apparently this is also a recurrent issue in game programming so I'd like to 
share my approach and request for comments.

My use-case:
    

  * I allocate and free memory through a custom allocator (cudaMalloc/cudaFree 
in my case).
  * Allocation/deallocation is very expensive and often occurs in tight loop.
  * Memory is at a premium, we cannot hoard unused memory.
  * Memory per object is huge, for example VGG neural network (one of the 
earliest) with a typical image of size 224x224x3 RGB (150k parameter) will have 
[138 millions of float32 parameters and takes about 
93MB/image](https://cs231n.github.io/convolutional-networks/#case). A typical 
batch size is 32~128 images, and a new batch is required every 200ms~1s (check 
"OxfordNet" the other name of VGG in this 
[benchmark](https://github.com/soumith/convnet-benchmarks))


Reply via email to