Lunderberg opened a new pull request, #15957: URL: https://github.com/apache/tvm/pull/15957
Prior to this commit, sharding of model weights was always performed when initializing the model. This could cause slow initialization, especially for larger numbers of GPUs, as all model weights are initially transferred to GPU-0, before being scattered to all workers. This commit updates the `tvm::runtime::ShardLoaderObj` to also allow loading of pre-sharded model weights. With pre-sharded model weights, the tensors are sharded while the model is being built, and each worker independently loads the specific model weights that it requires. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
