[ https://issues.apache.org/jira/browse/SYSTEMML-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428709#comment-16428709 ]
LI Guobao commented on SYSTEMML-2197: ------------------------------------- Thanks, I successfully launched this test and got 4 failed tests(testDenseDenseMapmmMR, testDenseSparseMapmmMR, testSparseDenseMapmmMR, testSparseSparseMapmmMR). And it seems to concern the MR backend. When reswitching to the master branch, I got the same result. So is it a detected bug? Could I ignore it? Thanks for the response. Here is the stack information: {code:java} 18/04/06 19:38:48 ERROR api.DMLScript: Failed to execute DML script. org.apache.sysml.runtime.DMLRuntimeException: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 23 and 27 -- Error evaluating instruction: jobtype = GMR input labels = [_mVar13, _mVar14] recReader inst = rand inst = mapper inst = MR°mapmm°0·MATRIX·DOUBLE°1·MATRIX·DOUBLE°2·MATRIX·DOUBLE°RIGHT°false shuffle inst = agg inst = MR°ak+°2·MATRIX·DOUBLE°3·MATRIX·DOUBLE°true°NONE other inst = output labels = [pVar15] result indices = ,3 num reducers = 10 replication = 1 at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) at org.apache.sysml.api.ScriptExecutorUtils.executeRuntimeProgram(ScriptExecutorUtils.java:97) at org.apache.sysml.api.DMLScript.execute(DMLScript.java:744) at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:515) at org.apache.sysml.api.DMLScript.main(DMLScript.java:246) at org.apache.sysml.test.integration.AutomatedTestBase.runTest(AutomatedTestBase.java:1214) at org.apache.sysml.test.integration.functions.binary.matrix_full_other.FullDistributedMatrixMultiplicationTest.runDistributedMatrixMatrixMultiplicationTest(FullDistributedMatrixMultiplicationTest.java:276) at org.apache.sysml.test.integration.functions.binary.matrix_full_other.FullDistributedMatrixMultiplicationTest.testSparseSparseMapmmMR(FullDistributedMatrixMultiplicationTest.java:101){code} > Multi-threaded broadcast creation > --------------------------------- > > Key: SYSTEMML-2197 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2197 > Project: SystemML > Issue Type: Task > Reporter: Matthias Boehm > Assignee: LI Guobao > Priority: Major > > All spark instructions that broadcast one of the input operands, rely on a > shared primitive {{sec.getBroadcastForVariable(var)}} for creating > partitioned broadcasts, which are wrapper objects around potentially many > broadcast variables to overcome Spark 2GB limitation for compressed > broadcasts. Each individual broadcast blocks the matrix into squared blocks > for direct access without unnecessary copy per task. So far this broadcast > creation is single-threaded. > This task aims to parallelize the blocking of the given in-memory matrix into > squared blocks > (https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/instructions/spark/data/PartitionedBlock.java#L82) > as well as the subsequent partition creation and actual broadcasting > (https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/controlprogram/context/SparkExecutionContext.java#L548). > > For consistency and in order to avoid excessive over-provisioning, this > multi-threading should use the common internal thread pool or parallel java > streams, which similarly calls the shared {{ForkJoinPool.commonPool}}. An > example is the multi-threaded parallelization of RDDs which similarly blocks > a given matrix into its squared blocks (see > https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/controlprogram/context/SparkExecutionContext.java#L679). -- This message was sent by Atlassian JIRA (v7.6.3#76005)