[
https://issues.apache.org/jira/browse/MAPREDUCE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth resolved MAPREDUCE-7531.
--------------------------------------
Fix Version/s: 3.5.0
Hadoop Flags: Reviewed
Target Version/s: 3.5.0 (was: 3.5.0, 3.5.1)
Resolution: Fixed
> TestMRJobs.testThreadDumpOnTaskTimeout flaky due to thread dump delayed write
> -----------------------------------------------------------------------------
>
> Key: MAPREDUCE-7531
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7531
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mapreduce-client
> Affects Versions: 3.5.0
> Reporter: Shilun Fan
> Assignee: Shilun Fan
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.0
>
>
> *Problem*
> TestMRJobs.testThreadDumpOnTaskTimeout is flaky because it checks for thread
> dump output immediately after job completion, but the thread dump may be
> written to logs asynchronously with a delay, causing false negatives.
>
> *Solution*
> Add polling mechanism (up to 30 seconds, retry every second) to wait for
> thread dump to appear in syslog/stdout before asserting. This ensures:
> - Log file counts are correct
> - Container types (AM/Map) match expectations
> - Thread dumps are actually written before validation
> *Changes*
> - Wrap log scanning in a retry loop with 30-second deadline
> - Accumulate container counts and thread dump flags in each iteration
> - Move all assertions after polling completes to avoid race conditions
> - Fix @Timeout values for related tests (3000→300 seconds)
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]