[ 
https://issues.apache.org/jira/browse/TEZ-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907198#comment-14907198
 ] 

Bikas Saha commented on TEZ-2398:
---------------------------------

1) removes check in testbasicinputfailure because the attempt which is expected 
to not see any input errors, may actually see input errors because of 
scheduling/launching delays.
2) fixed testInput to not hang because of race between handleEvents and doRead 
when multiple input versions arrive at the same time.
3) reduce minicluster NMs to 3 to reduce the number of parallel processes. that 
decreases the number of concurrent containers in the test and reduces flakiness 
for large tests like testRandomFailingInputs. These fail intermittently due to 
container launch errors because of overload.
Ran TestFaultTolerance in a loop for 10 runs without any error. Before this it 
would fail in 1 out of 3 runs due to 1) and 3).

[~rajesh.balamohan] [~zjffdu] Please review.

> Flaky test: TestFaultTolerance
> ------------------------------
>
>                 Key: TEZ-2398
>                 URL: https://issues.apache.org/jira/browse/TEZ-2398
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Bikas Saha
>         Attachments: TEZ-2398.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to