[
https://issues.apache.org/jira/browse/TEZ-4149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103867#comment-17103867
]
László Bodor edited comment on TEZ-4149 at 5/10/20, 4:31 PM:
-------------------------------------------------------------
thanks [~jeagles], waiting for the starter patch
in the meantime, I was playing with parallelism, and I found that with fair
scheduler I could have improved the overall testing time, please let me know if
it makes sense
I modified testRecovery_OrderedWordCount to run all the cases (except the flaky
from TEZ-4173), and I got the following result (scheduler, thread count):
{code}
CAPACITY SCHEDULER:
thread pool 10:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 400.297
s - in org.apache.tez.test.TestRecovery
thread pool 5:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 401.501
s - in org.apache.tez.test.TestRecovery
thread pool 3:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 410.556
s - in org.apache.tez.test.TestRecovery
thread pool: 1
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 385.171
s - in org.apache.tez.test.TestRecovery
FAIR SCHEDULER:
thread pool 15:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 332.778
s - in org.apache.tez.test.TestRecovery
thread pool 10:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 320.876
s - in org.apache.tez.test.TestRecovery
thread pool 5:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 281.534
s - in org.apache.tez.test.TestRecovery
thread pool 3:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 269.571
s - in org.apache.tez.test.TestRecovery
thread pool 2:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 253.058
s - in org.apache.tez.test.TestRecovery
thread pool: 1
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 388.268
s - in org.apache.tez.test.TestRecovery
{code}
tests use capacity scheduler by default, and after changing to fair scheduler I
got a runtime minimum at 2-3 threads, important to add that this is an absolute
zero config, I've only changed:
{code}
conf.set(YarnConfiguration.RM_SCHEDULER,
"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler");
{code}
I think the recovery tests are still valid by running AMs on the same
minicluster in a parallel manner
was (Author: abstractdog):
thanks [~jeagles], waiting for the starter patch
in the meantime, I was playing with parallelism, and I found that with fair
scheduler I could have improved the overall testing time, please let me know if
it makes sense
I modified testRecovery_OrderedWordCount to run all the cases (except the flaky
from TEZ-4173), and I got the following result (scheduler, thread count):
{code}
CAPACITY SCHEDULER:
thread pool 10:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 400.297
s - in org.apache.tez.test.TestRecovery
thread pool 5:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 401.501
s - in org.apache.tez.test.TestRecovery
thread pool 3:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 410.556
s - in org.apache.tez.test.TestRecovery
thread pool: 1
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 385.171
s - in org.apache.tez.test.TestRecovery
FAIR SCHEDULER:
thread pool 15:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 332.778
s - in org.apache.tez.test.TestRecovery
thread pool 10:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 320.876
s - in org.apache.tez.test.TestRecovery
thread pool 5:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 281.534
s - in org.apache.tez.test.TestRecovery
thread pool 3:
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 269.571
s - in org.apache.tez.test.TestRecovery
thread pool: 1
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 388.268
s - in org.apache.tez.test.TestRecovery
{code}
tests use capacity scheduler by default, and after changing to fair scheduler I
got a runtime minimum at 3 threads, important to add that this is an absolute
zero config, I've only changed:
{code}
conf.set(YarnConfiguration.RM_SCHEDULER,
"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler");
{code}
I think the recovery tests are still valid by running AMs on the same
minicluster in a parallel manner
> Speed up TezRecovery tests
> --------------------------
>
> Key: TEZ-4149
> URL: https://issues.apache.org/jira/browse/TEZ-4149
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jonathan Turner Eagles
> Assignee: László Bodor
> Priority: Major
> Attachments: org.apache.tez.test.TestRecovery-output.txt
>
>
> Currently, approximately 50% of the tests cases are chosen to run as there
> are many failure points chosen to test recovery on.
> This can lead to the introduction of bugs into the code as not all test cases
> are run for every Tez QA run.
> In addition, this can be a real development bottleneck as tests take around
> 20 minutes per cycle if all tests are run (10 minutes if 50% of the tests are
> run as usual)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)