[
https://issues.apache.org/jira/browse/SYSTEMML-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706876#comment-15706876
]
Matthias Boehm commented on SYSTEMML-1131:
------------------------------------------
sure, at a high-level, we have to recompile any operations in the body of a
remote_mr or remote_spark parfor to CP as we cannot execute nested distributed
operations (e.g., spawning a spark job from within another spark job task). In
your scenario, it probably shows up with a dml-bodied function as this might
have lead to unknowns and hence unnecessary distributed operations during
initial compilation in the first place. After the remote parfor we have to
reset any forced execution types, as the same function might be used with
different input data later in the program.
Anyway, I think this is simply a missing null check. Sometimes predicate dags
are empty (e.g, if the TO clause of a FOR loop is a literal, we do not
represent this as a DAG because there are no operations but handle this - for
historic reasons - with some custom logic). In your case, it fails on trying to
synchronize on the root node of the predicate DAG, which simply does not exist.
> NPE in executeRemoteSparkParFor
> -------------------------------
>
> Key: SYSTEMML-1131
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1131
> Project: SystemML
> Issue Type: Bug
> Reporter: Felix Schüler
>
> The method ParForProgramBlock.releaseForcedRecompile(long tid) calls
> recompileProgramBlockHierarchy2Forced with execution type (et) null. This
> leads to a NullPointerException.
> I haven't fully figured out under which circumstances this occurs but it
> happens when calling an external function inside a forced parfor_spark.
> The ParForProgramBlock.executeRemoteSparkParFor method sets the flagForced to
> true which then in turn calls the above method with et==null.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)