[
https://issues.apache.org/jira/browse/TEZ-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997646#comment-16997646
]
TezQA commented on TEZ-4108:
----------------------------
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 1 new or modified test
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
31s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m
18s{color} | {color:blue} Used deprecated FindBugs config; considering
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
16s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m
46s{color} | {color:green} tez-dag in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
8s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m 9s{color} |
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 base:
https://builds.apache.org/job/PreCommit-TEZ-Build/229/artifact/out/Dockerfile |
| JIRA Issue | TEZ-4108 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12988841/TEZ-4108.001.patch |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs
checkstyle compile |
| uname | Linux 3af0bb341d51 4.15.0-70-generic #79-Ubuntu SMP Tue Nov 12
10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/tez.sh |
| git revision | master / 09c6f90 |
| Default Java | 1.8.0_222 |
| Test Results |
https://builds.apache.org/job/PreCommit-TEZ-Build/229/testReport/ |
| Max. process+thread count | 219 (vs. ulimit of 5500) |
| modules | C: tez-dag U: tez-dag |
| Console output |
https://builds.apache.org/job/PreCommit-TEZ-Build/229/console |
| versions | git=2.7.4 maven=3.3.9 findbugs=3.0.1 |
| Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
This message was automatically generated.
> NullPointerException during speculative execution race condition
> ----------------------------------------------------------------
>
> Key: TEZ-4108
> URL: https://issues.apache.org/jira/browse/TEZ-4108
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Turner Eagles
> Assignee: Jonathan Turner Eagles
> Priority: Major
> Attachments: TEZ-4108.001.patch
>
>
> What is happening is a race condition between speculative execution task
> launch and failure. Such that when the speculative attempt begins to launch,
> the earlier attempt is no longer running. Somehow the containerNodeId is
> becoming null. From the log line we can see that at least when the original
> task attempt starts, the container node id is not null.
> This points to a bug in Tez.
> Posting AM log context and full stack trace.
> {noformat}
> 2019-11-18 12:58:13,261 [INFO] [Dispatcher thread {Central}]
> |HistoryEventHandler.criticalEvents|:
> [HISTORY][DAG:dag_1571959303156_328042_1][Event:TASK_ATTEMPT_STARTED]:
> vertexName=Map 1, taskAttemptId=attempt_1571959303156_328042_1_00_091618_0,
> startTime=1574081893261,
> containerId=container_e20_1571959303156_328042_01_006583,
> nodeId=node1.example.com:8041
> 2019-11-18 12:58:25,001 [INFO] [Dispatcher thread {Central}]
> |HistoryEventHandler.criticalEvents|:
> [HISTORY][DAG:dag_1571959303156_328042_1][Event:TASK_STARTED]: vertexName=Map
> 1, taskId=task_1571959303156_328042_1_00_091618, scheduledTime=1574081308038,
> launchTime=1574081893261
> 2019-11-18 12:58:26,916 [INFO] [Dispatcher thread {Central}]
> |HistoryEventHandler.criticalEvents|:
> [HISTORY][DAG:dag_1571959303156_328042_1][Event:TASK_ATTEMPT_FINISHED]:
> vertexName=Map 1, taskAttemptId=attempt_1571959303156_328042_1_00_091618_0,
> creationTime=1574081308038, allocationTime=1574081880179,
> startTime=1574081893261, finishTime=1574081906915, timeTaken=13654,
> status=KILLED, errorEnum=EXTERNAL_PREEMPTION, diagnostics=Container
> container_e20_1571959303156_328042_01_006583 finished with diagnostics set to
> [Container preempted externally. Container Preempted due to node overcommit
> 2019-11-18 12:58:27,264 [INFO] [Dispatcher thread {Speculator}]
> |legacy.LegacySpeculator|: DefaultSpeculator.addSpeculativeAttempt -- we are
> speculating task_1571959303156_328042_1_00_091618
> 2019-11-18 12:58:34,633 [INFO] [TaskSchedulerEventHandlerThread]
> |rm.TaskSchedulerManager|: Task: attempt_1571959303156_328042_1_00_091618_0
> has no container assignment in the scheduler
> 2019-11-18 12:58:34,636 [ERROR] [TaskSchedulerEventHandlerThread]
> |rm.TaskSchedulerManager|: No container allocated to task:
> attempt_1571959303156_328042_1_00_091618_0 according to scheduler. Task
> reported container id: container_e20_1571959303156_328042_01_006583
> 2019-11-18 12:58:34,866 [INFO] [Dispatcher thread {Central}] |impl.TaskImpl|:
> Scheduling a redundant attempt for task task_1571959303156_328042_1_00_091618
> 2019-11-18 12:58:34,866 [ERROR] [Dispatcher thread {Central}]
> |impl.TaskImpl|: Uncaught exception when trying handle event
> T_ADD_SPEC_ATTEMPT at current state RUNNING for task
> task_1571959303156_328042_1_00_091618
> java.lang.NullPointerException
> at
> java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
> at
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
> at java.util.Collections$SetFromMap.add(Collections.java:5461)
> at
> org.apache.tez.dag.app.dag.impl.TaskImpl$RedundantScheduleTransition.transition(TaskImpl.java:1044)
> at
> org.apache.tez.dag.app.dag.impl.TaskImpl$RedundantScheduleTransition.transition(TaskImpl.java:1032)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
> at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:841)
> at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:115)
> at
> org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:2311)
> at
> org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:2298)
> at
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> {code:title=TaskImpl.java}
> 1032 private static class RedundantScheduleTransition
> 1033 implements SingleArcTransition<TaskImpl, TaskEvent> {
> 1034
> 1035 @Override
> 1036 public void transition(TaskImpl task, TaskEvent event) {
> 1037 LOG.info("Scheduling a redundant attempt for task " + task.taskId);
> 1038
> task.counters.findCounter(TaskCounter.NUM_SPECULATIONS).increment(1);
> 1039 TaskAttempt earliestUnfinishedAttempt = null;
> 1040 for (TaskAttempt ta : task.attempts.values()) {
> 1041 // find the oldest running attempt
> 1042 if (!ta.isFinished()) {
> 1043 earliestUnfinishedAttempt = ta;
> 1044 task.nodesWithRunningAttempts.add(ta.getNodeId());
> 1045 } else {
> 1046 if (TaskAttemptState.SUCCEEDED.equals(ta.getState())) {
> 1047 LOG.info("Ignore speculation scheduling for task {} since it
> has succeeded with attempt {}.",
> 1048 task.getTaskId(), ta.getID());
> 1049 return;
> 1050 }
> 1051 }
> 1052 }
> 1053 if (earliestUnfinishedAttempt == null) {
> 1054 // no running (or SUCCEEDED) task attempt at this moment, no
> need to schedule speculative attempt either
> 1055 LOG.info("Ignore speculation scheduling since there is no
> running attempt on task {}.", task.getTaskId());
> 1056 return;
> 1057 }
> 1058 if (task.commitAttempt != null) {
> 1059 LOG.info("Ignore speculation scheduling for task {} since commit
> has started with commitAttempt {}.",
> 1060 task.getTaskId(), task.commitAttempt);
> 1061 return;
> 1062 }
> 1063 task.addAndScheduleAttempt(earliestUnfinishedAttempt.getID());
> 1064 }
> 1065 }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)