AndrewZhaoLuo opened a new pull request, #13877:
URL: https://github.com/apache/tvm/pull/13877

   Right now there is a bad pattern in VM executable where when loading 
weights, we load serialized representation in memory, and then deserialize off 
the in-memory store without progressively freeing memory.
   
   This is bad because if our weights take up ~ 5GB, then the serialized 
representation in memory takes up 5GB and the deserialized representation will 
take ~ 5 GB too. This means peak memory use for using the VM for execution is 2 
* the size of the weight models.
   
   This is bad, especially with some of the larger models out there today. 
   
   This fixes thing by using a stream from disk, and depending on the standard 
C file interface to buffer things for performant results. 
   
   Some before and after graphs though loading and benchmarking a model with 
~5GB weights:
   
   Before:
   
   
![image](https://user-images.githubusercontent.com/13855451/215629180-4d07e0b4-cb6e-4535-88ce-f8b4346f8698.png)
   
   After:
   
   
![image](https://user-images.githubusercontent.com/13855451/215629115-a6ac9f3a-98e4-4d37-a7a3-fb9a6d26a3c3.png)
   
   This is a draft since:
   - I've only tested loading weights, but we can see similar savings in other 
similar streams.
   - We need to make a decision on DMLC stream interface. The main issue is 
that a lot of existing code depends on DMLC stream interface, but DMLC itself 
is a header only library. We only have access to in-memory streams in the 
current state. The way I have gotten around this is by implementing a simple 
class.
   - We need to decide best way forward. The one in this PR is simple, though 
technically duplicates some code from DMLC core lib
   - Alternatives are including DMLC as dependency, adding to DMLC 
functionality and pulling those things changes, or get rid of DMLC stream 
interface entirely
   - This one is the simplest which is why I will do this for the draft.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to