[ 
https://issues.apache.org/jira/browse/TEZ-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010666#comment-18010666
 ] 

László Bodor commented on TEZ-4565:
-----------------------------------

tempted to merge this PR, linked TEZ-4553, which caused this behavior change, 
but not a bug actually

with yarn task scheudler service:  [^syslog_dag_with_old_scheduler.log] 
there is only a single preemption for the expected v3 vertex
{code}
2025-07-29 14:51:43,315 [INFO] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|: Preempting container: 
container_1753793491126_0001_01_000003 currently allocated to a task.

2025-07-29 14:51:43,318 [INFO] [Dispatcher thread {Central}] 
|HistoryEventHandler.criticalEvents|: 
[HISTORY][DAG:dag_1753793491126_0001_1][Event:TASK_ATTEMPT_FINISHED]: 
vertexName=v3, taskAttemptId=attempt_1753793491126_0001_1_02_000000_0, 
creationTime=1753793502634, allocationTime=1753793502634, 
startTime=1753793502644, finishTime=1753793503318, timeTaken=674, 
status=KILLED, errorEnum=INTERNAL_PREEMPTION, diagnostics=Container 
container_1753793491126_0001_01_000003 finished with diagnostics set to 
[Container preempted internally], nodeHttpAddress=localhost:61240, 
counters=Counters: 0
{code}

with the dag aware one, no matter if the test fails:  
[^syslog_dag_1753791202801_0001_1.failed.txt] 
or passes:   [^syslog_dag_1753792102822_0001_1.passed.txt] 
mind "DagAwareYarnTaskScheduler" as the scheduler due to TEZ-4553
there are 2 preemptions:
{code}
2025-07-29 14:13:34,010 [INFO] [AMRM Callback Handler Thread] 
|rm.DagAwareYarnTaskScheduler|: Preempting container 
container_1753791202801_0001_01_000005 currently allocated to task 
attempt_1753791202801_0001_1_01_000001_1
2025-07-29 14:13:34,010 [INFO] [AMRM Callback Handler Thread] 
|rm.DagAwareYarnTaskScheduler|: Preempting container 
container_1753791202801_0001_01_000005 currently allocated to a task

2025-07-29 14:13:34,335 [INFO] [AMRM Callback Handler Thread] 
|rm.DagAwareYarnTaskScheduler|: Preempting container 
container_1753791202801_0001_01_000002 currently allocated to task 
attempt_1753791202801_0001_1_02_000000_0
2025-07-29 14:13:34,335 [INFO] [AMRM Callback Handler Thread] 
|rm.DagAwareYarnTaskScheduler|: Preempting container 
container_1753791202801_0001_01_000002 currently allocated to a task
{code}

assuming that the test detects the preemption of task of v3 vertex all the 
time, we need to make it more robust and let it be resilient to whether a task 
of v2 vertex has also been preempted or not

> TestAnalyzer subtest testInternalPreemption is flaky
> ----------------------------------------------------
>
>                 Key: TEZ-4565
>                 URL: https://issues.apache.org/jira/browse/TEZ-4565
>             Project: Apache Tez
>          Issue Type: Test
>            Reporter: Jonathan Turner Eagles
>            Assignee: Jonathan Turner Eagles
>            Priority: Major
>             Fix For: 0.10.4
>
>         Attachments: syslog_dag_1753791202801_0001_1.failed.txt, 
> syslog_dag_1753792102822_0001_1.passed.txt, syslog_dag_with_old_scheduler.log
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to