[
https://issues.apache.org/jira/browse/TEZ-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907198#comment-14907198
]
Bikas Saha commented on TEZ-2398:
---------------------------------
1) removes check in testbasicinputfailure because the attempt which is expected
to not see any input errors, may actually see input errors because of
scheduling/launching delays.
2) fixed testInput to not hang because of race between handleEvents and doRead
when multiple input versions arrive at the same time.
3) reduce minicluster NMs to 3 to reduce the number of parallel processes. that
decreases the number of concurrent containers in the test and reduces flakiness
for large tests like testRandomFailingInputs. These fail intermittently due to
container launch errors because of overload.
Ran TestFaultTolerance in a loop for 10 runs without any error. Before this it
would fail in 1 out of 3 runs due to 1) and 3).
[~rajesh.balamohan] [~zjffdu] Please review.
> Flaky test: TestFaultTolerance
> ------------------------------
>
> Key: TEZ-2398
> URL: https://issues.apache.org/jira/browse/TEZ-2398
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Bikas Saha
> Attachments: TEZ-2398.1.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)