[ 
https://issues.apache.org/jira/browse/TEZ-4149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103751#comment-17103751
 ] 

László Bodor commented on TEZ-4149:
-----------------------------------

some gaps found in [^org.apache.tez.test.TestRecovery-output.txt] which may be 
reduced:

7s
{code}
2020-05-10 12:39:59,406 INFO  [NM ContainerManager dispatcher] 
loghandler.NonAggregatingLogHandler (NonAggregatingLogHandler.java:handle(173)) 
- Scheduling Log Deletion for application: application_1589107183822_0001, with 
delay of 10800 seconds
2020-05-10 12:40:06,424 INFO  [Listener at MacBook-Pro.local/54645] 
containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(762)) - Done waiting 
for Applications to be Finished. Still alive: [application_1589107183822_0001]
{code}

10s
{code}
2020-05-10 12:40:06,433 INFO  [IPC Server Responder] ipc.Server 
(Server.java:run(1466)) - Stopping IPC Server Responder
2020-05-10 12:40:16,451 INFO  [Listener at MacBook-Pro.local/54645] ipc.Server 
(Server.java:stop(3360)) - Stopping server on 54645
{code}

5s
{code}
2020-05-10 12:40:16,453 INFO  [Listener at MacBook-Pro.local/54645] 
nodemanager.NodeResourceMonitorImpl 
(NodeResourceMonitorImpl.java:isEnabled(85)) - Node Resource monitoring 
interval is <=0. 
org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl is disabled.
2020-05-10 12:40:21,475 WARN  [Listener at MacBook-Pro.local/54645] 
server.MiniYARNCluster (MiniYARNCluster.java:waitForAppMastersToFinish(526)) - 
Stopping RM while some app masters are still alive
{code}

this single case took 51s, but ~20s of it seemed to be "idle", maybe with some 
minicluster / yarn configuration would help

> Speed up TezRecovery tests
> --------------------------
>
>                 Key: TEZ-4149
>                 URL: https://issues.apache.org/jira/browse/TEZ-4149
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Jonathan Turner Eagles
>            Priority: Major
>         Attachments: org.apache.tez.test.TestRecovery-output.txt
>
>
> Currently, approximately 50% of the tests cases are chosen to run as there 
> are many failure points chosen to test recovery on.
> This can lead to the introduction of bugs into the code as not all test cases 
> are run for every Tez QA run.
> In addition, this can be a real development bottleneck as tests take around 
> 20 minutes per cycle if all tests are run (10 minutes if 50% of the tests are 
> run as usual)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to