[ 
https://issues.apache.org/jira/browse/TEZ-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997646#comment-16997646
 ] 

TezQA commented on TEZ-4108:
----------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
18s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
46s{color} | {color:green} tez-dag in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 8s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 base: 
https://builds.apache.org/job/PreCommit-TEZ-Build/229/artifact/out/Dockerfile |
| JIRA Issue | TEZ-4108 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12988841/TEZ-4108.001.patch |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
checkstyle compile |
| uname | Linux 3af0bb341d51 4.15.0-70-generic #79-Ubuntu SMP Tue Nov 12 
10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/tez.sh |
| git revision | master / 09c6f90 |
| Default Java | 1.8.0_222 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-TEZ-Build/229/testReport/ |
| Max. process+thread count | 219 (vs. ulimit of 5500) |
| modules | C: tez-dag U: tez-dag |
| Console output | 
https://builds.apache.org/job/PreCommit-TEZ-Build/229/console |
| versions | git=2.7.4 maven=3.3.9 findbugs=3.0.1 |
| Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |


This message was automatically generated.



> NullPointerException during speculative execution race condition
> ----------------------------------------------------------------
>
>                 Key: TEZ-4108
>                 URL: https://issues.apache.org/jira/browse/TEZ-4108
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Turner Eagles
>            Assignee: Jonathan Turner Eagles
>            Priority: Major
>         Attachments: TEZ-4108.001.patch
>
>
> What is happening is a race condition between speculative execution task 
> launch and failure. Such that when the speculative attempt begins to launch, 
> the earlier attempt is no longer running. Somehow the containerNodeId is 
> becoming null. From the log line we can see that at least when the original 
> task attempt starts, the container node id is not null.
> This points to a bug in Tez.
> Posting AM log context and full stack trace.
> {noformat}
> 2019-11-18 12:58:13,261 [INFO] [Dispatcher thread {Central}] 
> |HistoryEventHandler.criticalEvents|: 
> [HISTORY][DAG:dag_1571959303156_328042_1][Event:TASK_ATTEMPT_STARTED]: 
> vertexName=Map 1, taskAttemptId=attempt_1571959303156_328042_1_00_091618_0, 
> startTime=1574081893261, 
> containerId=container_e20_1571959303156_328042_01_006583, 
> nodeId=node1.example.com:8041
> 2019-11-18 12:58:25,001 [INFO] [Dispatcher thread {Central}] 
> |HistoryEventHandler.criticalEvents|: 
> [HISTORY][DAG:dag_1571959303156_328042_1][Event:TASK_STARTED]: vertexName=Map 
> 1, taskId=task_1571959303156_328042_1_00_091618, scheduledTime=1574081308038, 
> launchTime=1574081893261
> 2019-11-18 12:58:26,916 [INFO] [Dispatcher thread {Central}] 
> |HistoryEventHandler.criticalEvents|: 
> [HISTORY][DAG:dag_1571959303156_328042_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1571959303156_328042_1_00_091618_0, 
> creationTime=1574081308038, allocationTime=1574081880179, 
> startTime=1574081893261, finishTime=1574081906915, timeTaken=13654, 
> status=KILLED, errorEnum=EXTERNAL_PREEMPTION, diagnostics=Container 
> container_e20_1571959303156_328042_01_006583 finished with diagnostics set to 
> [Container preempted externally. Container Preempted due to node overcommit
> 2019-11-18 12:58:27,264 [INFO] [Dispatcher thread {Speculator}] 
> |legacy.LegacySpeculator|: DefaultSpeculator.addSpeculativeAttempt -- we are 
> speculating task_1571959303156_328042_1_00_091618
> 2019-11-18 12:58:34,633 [INFO] [TaskSchedulerEventHandlerThread] 
> |rm.TaskSchedulerManager|: Task: attempt_1571959303156_328042_1_00_091618_0 
> has no container assignment in the scheduler
> 2019-11-18 12:58:34,636 [ERROR] [TaskSchedulerEventHandlerThread] 
> |rm.TaskSchedulerManager|: No container allocated to task: 
> attempt_1571959303156_328042_1_00_091618_0 according to scheduler. Task 
> reported container id: container_e20_1571959303156_328042_01_006583
> 2019-11-18 12:58:34,866 [INFO] [Dispatcher thread {Central}] |impl.TaskImpl|: 
> Scheduling a redundant attempt for task task_1571959303156_328042_1_00_091618
> 2019-11-18 12:58:34,866 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskImpl|: Uncaught exception when trying handle event 
> T_ADD_SPEC_ATTEMPT at current state RUNNING for task 
> task_1571959303156_328042_1_00_091618
> java.lang.NullPointerException
>         at 
> java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
>         at 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
>         at java.util.Collections$SetFromMap.add(Collections.java:5461)
>         at 
> org.apache.tez.dag.app.dag.impl.TaskImpl$RedundantScheduleTransition.transition(TaskImpl.java:1044)
>         at 
> org.apache.tez.dag.app.dag.impl.TaskImpl$RedundantScheduleTransition.transition(TaskImpl.java:1032)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
>         at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:841)
>         at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:115)
>         at 
> org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:2311)
>         at 
> org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:2298)
>         at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
>         at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}
> {code:title=TaskImpl.java}
> 1032   private static class RedundantScheduleTransition
> 1033     implements SingleArcTransition<TaskImpl, TaskEvent> {
> 1034
> 1035     @Override
> 1036     public void transition(TaskImpl task, TaskEvent event) {
> 1037       LOG.info("Scheduling a redundant attempt for task " + task.taskId);
> 1038       
> task.counters.findCounter(TaskCounter.NUM_SPECULATIONS).increment(1);
> 1039       TaskAttempt earliestUnfinishedAttempt = null;
> 1040       for (TaskAttempt ta : task.attempts.values()) {
> 1041         // find the oldest running attempt
> 1042         if (!ta.isFinished()) {
> 1043           earliestUnfinishedAttempt = ta;
> 1044           task.nodesWithRunningAttempts.add(ta.getNodeId());
> 1045         } else {
> 1046           if (TaskAttemptState.SUCCEEDED.equals(ta.getState())) {
> 1047             LOG.info("Ignore speculation scheduling for task {} since it 
> has succeeded with attempt {}.",
> 1048                 task.getTaskId(), ta.getID());
> 1049             return;
> 1050           }
> 1051         }
> 1052       }
> 1053       if (earliestUnfinishedAttempt == null) {
> 1054         // no running (or SUCCEEDED) task attempt at this moment, no 
> need to schedule speculative attempt either
> 1055         LOG.info("Ignore speculation scheduling since there is no 
> running attempt on task {}.", task.getTaskId());
> 1056         return;
> 1057       }
> 1058       if (task.commitAttempt != null) {
> 1059         LOG.info("Ignore speculation scheduling for task {} since commit 
> has started with commitAttempt {}.",
> 1060             task.getTaskId(), task.commitAttempt);
> 1061         return;
> 1062       }
> 1063       task.addAndScheduleAttempt(earliestUnfinishedAttempt.getID());
> 1064     }
> 1065   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to