## Description ## Remove from the FAQ outdated references to model parallelism. Model parallelism should be avoided as it is less performant than data parallelism, better accumulate gradients through multiple mini batches of small batch size rather than splitting the model to move bigger batches across GPUs.
- [ ] Changes are complete (i.e. I finished coding on this PR) [ Full content available at: https://github.com/apache/incubator-mxnet/pull/12298 ] This message was relayed via gitbox.apache.org for [email protected]
