Lunderberg opened a new pull request, #15957:
URL: https://github.com/apache/tvm/pull/15957

   Prior to this commit, sharding of model weights was always performed when 
initializing the model.  This could cause slow initialization, especially for 
larger numbers of GPUs, as all model weights are initially transferred to 
GPU-0, before being scattered to all workers.
   
   This commit updates the `tvm::runtime::ShardLoaderObj` to also allow loading 
of pre-sharded model weights.  With pre-sharded model weights, the tensors are 
sharded while the model is being built, and each worker independently loads the 
specific model weights that it requires.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to