[
https://issues.apache.org/jira/browse/TEZ-4149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103751#comment-17103751
]
László Bodor commented on TEZ-4149:
-----------------------------------
some gaps found in [^org.apache.tez.test.TestRecovery-output.txt] which may be
reduced:
7s
{code}
2020-05-10 12:39:59,406 INFO [NM ContainerManager dispatcher]
loghandler.NonAggregatingLogHandler (NonAggregatingLogHandler.java:handle(173))
- Scheduling Log Deletion for application: application_1589107183822_0001, with
delay of 10800 seconds
2020-05-10 12:40:06,424 INFO [Listener at MacBook-Pro.local/54645]
containermanager.ContainerManagerImpl
(ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(762)) - Done waiting
for Applications to be Finished. Still alive: [application_1589107183822_0001]
{code}
10s
{code}
2020-05-10 12:40:06,433 INFO [IPC Server Responder] ipc.Server
(Server.java:run(1466)) - Stopping IPC Server Responder
2020-05-10 12:40:16,451 INFO [Listener at MacBook-Pro.local/54645] ipc.Server
(Server.java:stop(3360)) - Stopping server on 54645
{code}
5s
{code}
2020-05-10 12:40:16,453 INFO [Listener at MacBook-Pro.local/54645]
nodemanager.NodeResourceMonitorImpl
(NodeResourceMonitorImpl.java:isEnabled(85)) - Node Resource monitoring
interval is <=0.
org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl is disabled.
2020-05-10 12:40:21,475 WARN [Listener at MacBook-Pro.local/54645]
server.MiniYARNCluster (MiniYARNCluster.java:waitForAppMastersToFinish(526)) -
Stopping RM while some app masters are still alive
{code}
this single case took 51s, but ~20s of it seemed to be "idle", maybe with some
minicluster / yarn configuration would help
> Speed up TezRecovery tests
> --------------------------
>
> Key: TEZ-4149
> URL: https://issues.apache.org/jira/browse/TEZ-4149
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jonathan Turner Eagles
> Priority: Major
> Attachments: org.apache.tez.test.TestRecovery-output.txt
>
>
> Currently, approximately 50% of the tests cases are chosen to run as there
> are many failure points chosen to test recovery on.
> This can lead to the introduction of bugs into the code as not all test cases
> are run for every Tez QA run.
> In addition, this can be a real development bottleneck as tests take around
> 20 minutes per cycle if all tests are run (10 minutes if 50% of the tests are
> run as usual)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)