[GitHub] [tvm] AndrewZhaoLuo opened a new pull request, #13877: [VM] Lower memory usage when loading and dumping weights

via GitHub Mon, 30 Jan 2023 16:46:11 -0800


AndrewZhaoLuo opened a new pull request, #13877:
URL: https://github.com/apache/tvm/pull/13877

Right now there is a bad pattern in VM executable where when loading
weights, we load serialized representation in memory, and then deserialize off
the in-memory store without progressively freeing memory.

This is bad because if our weights take up ~ 5GB, then the serialized
representation in memory takes up 5GB and the deserialized representation will
take ~ 5 GB too. This means peak memory use for using the VM for execution is 2
* the size of the weight models.

This is bad, especially with some of the larger models out there today.

This fixes thing by using a stream from disk, and depending on the standard
C file interface to buffer things for performant results.

Some before and after graphs though loading and benchmarking a model with
~5GB weights:

Before:

![image](https://user-images.githubusercontent.com/13855451/215629180-4d07e0b4-cb6e-4535-88ce-f8b4346f8698.png)

After:

![image](https://user-images.githubusercontent.com/13855451/215629115-a6ac9f3a-98e4-4d37-a7a3-fb9a6d26a3c3.png)

This is a draft since:
- I've only tested loading weights, but we can see similar savings in other
similar streams.
- We need to make a decision on DMLC stream interface. The main issue is
that a lot of existing code depends on DMLC stream interface, but DMLC itself
is a header only library. We only have access to in-memory streams in the
current state. The way I have gotten around this is by implementing a simple
class.
- We need to decide best way forward. The one in this PR is simple, though
technically duplicates some code from DMLC core lib
- Alternatives are including DMLC as dependency, adding to DMLC
functionality and pulling those things changes, or get rid of DMLC stream
interface entirely
- This one is the simplest which is why I will do this for the draft.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] AndrewZhaoLuo opened a new pull request, #13877: [VM] Lower memory usage when loading and dumping weights

Reply via email to