[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-7531.
--------------------------------------
       Fix Version/s: 3.5.0
        Hadoop Flags: Reviewed
    Target Version/s: 3.5.0  (was: 3.5.0, 3.5.1)
          Resolution: Fixed

> TestMRJobs.testThreadDumpOnTaskTimeout flaky due to thread dump delayed write
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7531
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7531
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mapreduce-client
>    Affects Versions: 3.5.0
>            Reporter: Shilun Fan
>            Assignee: Shilun Fan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>
> *Problem*
> TestMRJobs.testThreadDumpOnTaskTimeout is flaky because it checks for thread 
> dump output immediately after job completion, but the thread dump may be 
> written to logs asynchronously with a delay, causing false negatives.
>  
> *Solution*
> Add polling mechanism (up to 30 seconds, retry every second) to wait for 
> thread dump to appear in syslog/stdout before asserting. This ensures:
> - Log file counts are correct
> - Container types (AM/Map) match expectations 
> - Thread dumps are actually written before validation
> *Changes*
> - Wrap log scanning in a retry loop with 30-second deadline
> - Accumulate container counts and thread dump flags in each iteration
> - Move all assertions after polling completes to avoid race conditions
> - Fix @Timeout values for related tests (3000→300 seconds)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to