kerneltime commented on pull request #2989:
URL: https://github.com/apache/ozone/pull/2989#issuecomment-1015065962


   > @kerneltime Do we need to restart SCM and datanodes between test cases?  
If so, do need as many as 10 datanodes?
   > 
   > 
   > 
   > The previous change in this test class added:
   > 
   > 
   > 
   > 
https://github.com/apache/ozone/blob/3eb7235ca879498c2d8ebcd5fd228d6e4cb16891/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java#L174-L182
   > 
   > 
   > 
   > `@BeforeEach` is from JUnit5, but the rest of the class uses JUnit4 
annotations.  Thus `restartDatanode()` is never called.  The corresponding 
JUnit4 annotation is `@Before`.  (Note: I plan to change this test to JUnit5 in 
HDDS-6095.)
   > 
   > 
   > 
   > Fixing the annotation would increase test execution time significantly 
from 100 seconds.  I don't know how much, because Surefire fork gets killed at 
20 minutes.
   > 
   > 
   > 
   > Also, if the annotation was fixed, SCM and datanodes would be restarted 
even before the first test case, which is unnecessary.  It may be better to 
change the cluster to per-test.
   > 
   > 
   > 
   > `master` with per-test cluster:
   > 
   > 
   > 
   > ```
   > 
   > Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 236.074 s 
- in org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures
   > 
   > ```
   > 
   > 
   > 
   > This patch with per-test cluster:
   > 
   > 
   > 
   > ```
   > 
   > Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 298.905 s 
- in org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures
   > 
   > ```
   > 
   > 
   > 
   > This patch with per-test cluster with 6 datanodes (not much improvement 
here):
   > 
   > 
   > 
   > ```
   > 
   > Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 292.42 s - 
in org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures
   > 
   > ```
   
   Let me look more deeper on the performance.
   These tests had 10 but I don't think we need 10. We would need more than 3. 
Let fix the annotation and look into performance. Without the restart, the 
failures induced can lead to other tests being flaky.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to