Dear all,
I am facing extreme delays in mxnet initialization on distributed GPU training using horovod. Once I launch my code (for debugging, just 2 gpus), it populates the GPUs up to some memory level, and then it does not start training until after 30 minutes (yes, that is minutes, and am using 12 cpu cores per rank). The truth is that the computation graph of these latest models is very complicated, but I cannot believe that this can be the issue. I hybridize the gluon models (net and loss function) prior to training with ```net.hybridize(static_alloc=True,static_shape=True)```. The problem is not resolved by defining the cache as described in [issue 3239](https://github.com/apache/incubator-mxnet/issues/3239#issuecomment-265103568). Any pointers/help mostly appreciated. --- [Visit Topic](https://discuss.mxnet.io/t/very-slow-initialisation-of-gpu-distributed-training/6357/1) or reply to this email to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.mxnet.io/email/unsubscribe/c3fb422ab93e8751a5252e3ac65d80138dd45a96ce494bd8f1202a06b793477a).
