Josh Rosen created SPARK-10381:
----------------------------------

             Summary: Infinite loop when OutputCommitCoordination is enabled 
and OutputCommitter.commitTask throws exception
                 Key: SPARK-10381
                 URL: https://issues.apache.org/jira/browse/SPARK-10381
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 1.4.1, 1.3.1, 1.5.0
            Reporter: Josh Rosen
            Assignee: Josh Rosen
            Priority: Critical


When speculative execution is enabled, consider a scenario where the authorized 
committer of a particular output partition fails during the 
OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator is 
supposed to release that committer's exclusive lock on committing once that 
task fails. However, due to a unit mismatch the lock will not be released, 
causing Spark to go into an infinite retry loop.

This bug was masked by the fact that the OutputCommitCoordinator does not have 
enough end-to-end tests (the current tests use many mocks). Other factors 
contributing to this bug are the fact that we have many similarly-named 
identifiers that have different semantics but the same data types (e.g. 
attemptNumber and taskAttemptId, with inconsistent variable naming which makes 
them difficult to distinguish).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to