[
https://issues.apache.org/jira/browse/SYSTEMML-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428709#comment-16428709
]
LI Guobao commented on SYSTEMML-2197:
-------------------------------------
Thanks, I successfully launched this test and got 4 failed
tests(testDenseDenseMapmmMR, testDenseSparseMapmmMR, testSparseDenseMapmmMR,
testSparseSparseMapmmMR). And it seems to concern the MR backend. When
reswitching to the master branch, I got the same result. So is it a detected
bug? Could I ignore it? Thanks for the response. Here is the stack information:
{code:java}
18/04/06 19:38:48 ERROR api.DMLScript: Failed to execute DML script.
org.apache.sysml.runtime.DMLRuntimeException:
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program
block generated from statement block between lines 23 and 27 -- Error
evaluating instruction: jobtype = GMR
input labels = [_mVar13, _mVar14]
recReader inst =
rand inst =
mapper inst =
MR°mapmm°0·MATRIX·DOUBLE°1·MATRIX·DOUBLE°2·MATRIX·DOUBLE°RIGHT°false
shuffle inst =
agg inst = MR°ak+°2·MATRIX·DOUBLE°3·MATRIX·DOUBLE°true°NONE
other inst =
output labels = [pVar15]
result indices = ,3
num reducers = 10
replication = 1
at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123)
at
org.apache.sysml.api.ScriptExecutorUtils.executeRuntimeProgram(ScriptExecutorUtils.java:97)
at org.apache.sysml.api.DMLScript.execute(DMLScript.java:744)
at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:515)
at org.apache.sysml.api.DMLScript.main(DMLScript.java:246)
at
org.apache.sysml.test.integration.AutomatedTestBase.runTest(AutomatedTestBase.java:1214)
at
org.apache.sysml.test.integration.functions.binary.matrix_full_other.FullDistributedMatrixMultiplicationTest.runDistributedMatrixMatrixMultiplicationTest(FullDistributedMatrixMultiplicationTest.java:276)
at
org.apache.sysml.test.integration.functions.binary.matrix_full_other.FullDistributedMatrixMultiplicationTest.testSparseSparseMapmmMR(FullDistributedMatrixMultiplicationTest.java:101){code}
> Multi-threaded broadcast creation
> ---------------------------------
>
> Key: SYSTEMML-2197
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2197
> Project: SystemML
> Issue Type: Task
> Reporter: Matthias Boehm
> Assignee: LI Guobao
> Priority: Major
>
> All spark instructions that broadcast one of the input operands, rely on a
> shared primitive {{sec.getBroadcastForVariable(var)}} for creating
> partitioned broadcasts, which are wrapper objects around potentially many
> broadcast variables to overcome Spark 2GB limitation for compressed
> broadcasts. Each individual broadcast blocks the matrix into squared blocks
> for direct access without unnecessary copy per task. So far this broadcast
> creation is single-threaded.
> This task aims to parallelize the blocking of the given in-memory matrix into
> squared blocks
> (https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/instructions/spark/data/PartitionedBlock.java#L82)
> as well as the subsequent partition creation and actual broadcasting
> (https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/controlprogram/context/SparkExecutionContext.java#L548).
>
> For consistency and in order to avoid excessive over-provisioning, this
> multi-threading should use the common internal thread pool or parallel java
> streams, which similarly calls the shared {{ForkJoinPool.commonPool}}. An
> example is the multi-threaded parallelization of RDDs which similarly blocks
> a given matrix into its squared blocks (see
> https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/controlprogram/context/SparkExecutionContext.java#L679).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)