[
https://issues.apache.org/jira/browse/SYSTEMML-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564750#comment-16564750
]
Matthias Boehm commented on SYSTEMML-2476:
------------------------------------------
thanks for catching this [~Guobao]. Let me demystify this my explaining the
three overlapping issues here:
* You see MR instead of SPARK jobs because the tests did not set SPARK hybrid
mode and hence we're running in hybrid (i.e., CP and MR).
* These distributed operations are caused by a missing literal replacement for
scalar lookups into lists which make C unknown and because the output sizes of
operations in the same DAG depend on C we compile conservative distributed
operations. I have an extension of the recompiler that fixes these unnecessary
distributed operations.
* However, there is a remaining issue. Specifically C comes out of the list
with value type STRING. I made the runtime robust enough to handle this but we
should also fix the root cause. I can have a look into this remaining issue
tomorrow. Until then please leave the JIRA open.
> Unexpected mapreduce task
> -------------------------
>
> Key: SYSTEMML-2476
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2476
> Project: SystemML
> Issue Type: Bug
> Reporter: LI Guobao
> Priority: Major
>
> When trying to use scalar casting to get element from a list, unexpected
> mapreduce tasks are launched instead of CP mode. The scenario is to replace
> *C = 1* with *C = as.scalar(hyperparams["C"])* inside the {{_gradient
> function_}} found in
> {{_src/test/scripts/functions/paramserv/mnist_lenet_paramserv.dml_}}. And
> then the problem could be reproduced by launching the method
> {{_testParamservBSPBatchDisjointContiguous_}} inside class
> _{{org.apache.sysml.test.integration.functions.paramserv.ParamservLocalNNTest}}_
> Here is the stack:
> {code:java}
> 18/07/31 22:10:27 INFO mapred.MapTask: numReduceTasks: 1
> 18/07/31 22:10:27 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
> 18/07/31 22:10:27 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
> 18/07/31 22:10:27 INFO mapred.MapTask: soft limit at 83886080
> 18/07/31 22:10:27 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
> 18/07/31 22:10:27 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
> 18/07/31 22:10:27 INFO mapreduce.Job: The url to track the job:
> http://localhost:8080/
> 18/07/31 22:10:27 INFO mapreduce.Job: Running job: job_local792652629_0008
> {code}
> [~mboehm7], if possible, could you take a look on this? And I've double
> checked the creation of execution context in
> {{ParamservBuiltinCPInstruction}}. But it is instance of ExecutionContext not
> SparkExecutionContext.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)