[SYSTEMML-1772] Fix matrix mult exec type selection w/ sparse mmchain This patch fixes OOM issues encountered on perftest MultiLogReg, 100M x 1K, sparse, which were due to incorrect execution type selection of matrix multiplications. The details are covered in the JIRA, but at a high-level, this issue occurs if the final matrix multiplication of an mmchain pattern (t(X) %*% (w*(X%*%v))), i.e., t(X) fits into CP but X does not. This can happen for tall and skinny sparse matrices where each row requires a memory overhead for sparse rows. We now select SPARK/MR for the entire pattern, which is useful because the first matrix multiplication will end-up in SPARK/MR anyway, allowing us to still compile the fused mmchain operator and avoid unnecessary transfer between CP and SPARK/MR backends.
Project: http://git-wip-us.apache.org/repos/asf/systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/1b3dff06 Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/1b3dff06 Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/1b3dff06 Branch: refs/heads/master Commit: 1b3dff06b8416975ed69bad1c119e0b33c3a6e95 Parents: 4ca4d34 Author: Matthias Boehm <[email protected]> Authored: Fri Jul 14 19:39:02 2017 -0700 Committer: Matthias Boehm <[email protected]> Committed: Fri Jul 14 20:49:23 2017 -0700 ---------------------------------------------------------------------- src/main/java/org/apache/sysml/hops/AggBinaryOp.java | 6 ++++++ 1 file changed, 6 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/systemml/blob/1b3dff06/src/main/java/org/apache/sysml/hops/AggBinaryOp.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/sysml/hops/AggBinaryOp.java b/src/main/java/org/apache/sysml/hops/AggBinaryOp.java index 9077976..4f709b4 100644 --- a/src/main/java/org/apache/sysml/hops/AggBinaryOp.java +++ b/src/main/java/org/apache/sysml/hops/AggBinaryOp.java @@ -458,6 +458,12 @@ public class AggBinaryOp extends Hop implements MultiThreadedHop _etype = REMOTE; } + //check for valid CP mmchain, send invalid memory requirements to remote + if( _etype == ExecType.CP && checkMapMultChain() != ChainType.NONE + && OptimizerUtils.getLocalMemBudget() < + getInput().get(0).getInput().get(0).getOutputMemEstimate() ) + _etype = REMOTE; + //check for valid CP dimensions and matrix size checkAndSetInvalidCPDimsAndSize(); }
