## Description ##
Remove from the FAQ outdated references to model parallelism.
Model parallelism should be avoided as it is less performant than data 
parallelism, better accumulate gradients through multiple mini batches of small 
batch size rather than splitting the model to move bigger batches across GPUs.

- [ ] Changes are complete (i.e. I finished coding on this PR)



[ Full content available at: 
https://github.com/apache/incubator-mxnet/pull/12298 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to