[
https://issues.apache.org/jira/browse/TEZ-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor resolved TEZ-4649.
-------------------------------
Resolution: Fixed
> Java21 upgrade made TestAnalyzer#testWithATS flaky - change entry to nanotime
> precision
> ---------------------------------------------------------------------------------------
>
> Key: TEZ-4649
> URL: https://issues.apache.org/jira/browse/TEZ-4649
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
> Fix For: 1.0.0
>
> Attachments:
> BAD_TEZ_4649_org.apache.tez.analyzer.TestAnalyzer-output.txt,
> GOOD_TEZ_4649_org.apache.tez.analyzer.TestAnalyzer-output.txt
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> tested by:
> {code}
> mvn clean install -Denforcer.skip=true -nsu -Dtest=TestAnalyzer#testWithATS
> -pl ./tez-tools/analyzers/job-analyzer -am
> {code}
> git bisected, and double-checked before and after this commit:
> {code}
> commit 03b5f1d68ed30d214d0a0aa3e6ec9fe67777372d
> TEZ-4636: Move to JDK-21+ (#418)
> {code}
> I discovered this issue while testing jersey upgrade in the scope of TEZ-4648
> I'm afraid we don't run all the unit tests in case of a root pom.xml change
> either, otherwise TEZ-4636 PR (https://github.com/apache/tez/pull/418),
> should have shown this problem, which it hasn't (only TestRecovery)
> bad (with JDK21 patch, on JDK21 locally):
> [^BAD_TEZ_4649_org.apache.tez.analyzer.TestAnalyzer-output.txt]
> good (without JDK21 patch, on JDK17 locally):
> [^GOOD_TEZ_4649_org.apache.tez.analyzer.TestAnalyzer-output.txt]
> important note: both good and bad contain this:
> {code}
> SEVERE: Failed to generate the schema for the JAX-B elements
> javax.xml.bind.JAXBException: TimelineEntity and TimelineEntities has
> IllegalAnnotation
> at
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.ContextFactory.createContext(ContextFactory.java)
> {code}
> so this is not what makes the unit test fail (this red herring caused a few
> hours of useless investigation for me in TEZ-4648)
> instead the problem might be:
> {code}
> 2025-09-17 12:49:38,826 INFO [qtp830449117-475] requests.applicationhistory
> (Slf4jRequestLogWriter.java:write(62)) - 127.0.0.1 - - [17/Sep/2025:10:49:38
> +0000] "GET
> /ws/v1/timeline/TEZ_TASK_ATTEMPT_ID?limit=100&primaryFilter=TEZ_DAG_ID:dag_1758106144809_0001_3&user.name=lbodor
> HTTP/1.1" 200 7438 "-" "Java/21.0.8"
> 2025-09-17 12:49:38,827 ERROR [Time-limited test] history.ATSImportTool
> (ATSImportTool.java:download(165)) - Exception in download
> java.util.zip.ZipException: duplicate entry: part-1758106178826.json
> at
> java.base/java.util.zip.ZipOutputStream.putNextEntry(ZipOutputStream.java:259)
> at
> org.apache.tez.history.ATSImportTool.downloadJSONArrayFromATS(ATSImportTool.java:265)
> at
> org.apache.tez.history.ATSImportTool.downloadData(ATSImportTool.java:233)
> at org.apache.tez.history.ATSImportTool.download(ATSImportTool.java:162)
> at org.apache.tez.history.ATSImportTool.run(ATSImportTool.java:350)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
> at org.apache.tez.history.ATSImportTool.process(ATSImportTool.java:464)
> at
> org.apache.tez.analyzer.TestAnalyzer.getDagInfo(TestAnalyzer.java:264)
> at org.apache.tez.analyzer.TestAnalyzer.verify(TestAnalyzer.java:252)
> at org.apache.tez.analyzer.TestAnalyzer.runTests(TestAnalyzer.java:391)
> at
> org.apache.tez.analyzer.TestAnalyzer.testWithATS(TestAnalyzer.java:355)
> at
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
> at java.base/java.lang.reflect.Method.invoke(Method.java:580)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> {code}
> by more detailed logging, I got:
> {code}
> 2025-09-17 13:09:05,068 INFO [qtp830449117-835] requests.applicationhistory
> (Slf4jRequestLogWriter.java:write(62)) - 127.0.0.1 - - [17/Sep/2025:11:09:05
> +0000] "GET
> /ws/v1/timeline/TEZ_VERTEX_ID?limit=100&primaryFilter=TEZ_DAG_ID:dag_1758107310713_0001_10&user.name=lbodor
> HTTP/1.1" 200 5942 "-" "Java/21.0.8"
> 2025-09-17 13:09:05,069 INFO [Time-limited test] history.ATSImportTool
> (ATSImportTool.java:downloadJSONArrayFromATS(268)) - Downloading 2 entities
> to zip entry: part-1758107345069.json
> 2025-09-17 13:09:05,069 INFO [qtp830449117-831] requests.applicationhistory
> (Slf4jRequestLogWriter.java:write(62)) - 127.0.0.1 - - [17/Sep/2025:11:09:05
> +0000] "GET
> /ws/v1/timeline/TEZ_TASK_ID?limit=100&primaryFilter=TEZ_DAG_ID:dag_1758107310713_0001_10&user.name=lbodor
> HTTP/1.1" 200 2780 "-" "Java/21.0.8"
> 2025-09-17 13:09:05,070 INFO [Time-limited test] history.ATSImportTool
> (ATSImportTool.java:downloadJSONArrayFromATS(268)) - Downloading 2 entities
> to zip entry: part-1758107345070.json
> 2025-09-17 13:09:05,070 INFO [qtp830449117-835] requests.applicationhistory
> (Slf4jRequestLogWriter.java:write(62)) - 127.0.0.1 - - [17/Sep/2025:11:09:05
> +0000] "GET
> /ws/v1/timeline/TEZ_TASK_ATTEMPT_ID?limit=100&primaryFilter=TEZ_DAG_ID:dag_1758107310713_0001_10&user.name=lbodor
> HTTP/1.1" 200 10395 "-" "Java/21.0.8"
> 2025-09-17 13:09:05,070 INFO [Time-limited test] history.ATSImportTool
> (ATSImportTool.java:downloadJSONArrayFromATS(268)) - Downloading 4 entities
> to zip entry: part-1758107345070.json
> {code}
> this is not necessarily related to Java21, as I was able to have a passing
> test on that also, but somehow Java21 test run made this very flaky, like
> returning to the same point in the loop more than once within a millisecond
> need to have some randomness here
--
This message was sent by Atlassian Jira
(v8.20.10#820010)