lhotari edited a comment on pull request #9393:
URL: https://github.com/apache/pulsar/pull/9393#issuecomment-770633183


   Nice work on this @michaeljmarshall . 
   
   However, it seems that the flakiness remains after this change.
   
   Sometimes it's hard to reproduce the flaky test failures locally. One thing 
that seems to be a common theme is that the flaky test failures happen in CI, 
but can be hard to produce in local environments. While working on the fix for 
flaky test MessageIdTest, I found a way to reproduce some flaky test failures 
effectively by limiting the CPU resources to somewhat similar that there is in 
CI. The CI tests run on an Azure VM that has 2 cores and about 6GB free RAM 
(IIRC). 
   
   Since I use Linux for development, the easiest way for me to constraint the 
resources of the test run was to use Docker. On other than Linux, VM tooling 
such as https://multipass.run/ could be helpful in creating an environment with 
limited cpu resources.
   
   These are the commands I used to test this change:
   ```
   $ gh pr checkout 9393
   $ mvn clean install -DskipTests -Dspotbugs.skip=true
   $ docker run --cpus=2 --memory=6g -u 1000:1000 --net=host -it --rm -v 
$HOME:$HOME -w $PWD -v /etc/passwd:/etc/passwd:ro ubuntu bash -c 'source 
"$HOME/.sdkman/bin/sdkman-init.sh"; counter=0; while mvn -Pcore-modules -pl 
pulsar-broker test -DfailIfNoTests=false -Dtest=AntiAffinityNamespaceGroupTest 
-DredirectTestOutputToFile=false -DtestRetryCount=0; do echo "----------- LOOP 
$counter ---------------"; ((counter++)); done; echo "LOOP $counter"' | tee 
docker_output_`date +%s`.log
   ```
   
   here's the output: 
https://gist.github.com/lhotari/7d8c7ae0a9e1a26d92599c585ba64e13
   and pulsar-broker/target/surefire-reports/testng-results.xml  
https://gist.github.com/lhotari/fbbcd1405d8f16be7106f8ece9f66084
   
   ```
   java.lang.AssertionError: did not expect [localhost:42919] but found 
[localhost:42919]
   at org.testng.Assert.fail(Assert.java:99)
   at org.testng.Assert.failEquals(Assert.java:1041)
   at org.testng.Assert.assertNotEqualsImpl(Assert.java:147)
   at org.testng.Assert.assertNotEquals(Assert.java:1531)
   at org.testng.Assert.assertNotEquals(Assert.java:1535)
   at 
org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup(AntiAffinityNamespaceGroupTest.java:427)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at 
org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
   at 
org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
   at 
org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
   at 
org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
   ```
   
   What comes into mind is that the test might start before both brokers are 
available. 
   Something like 
https://github.com/apache/pulsar/blob/24f759c677bfe7cbb2228cab8a38f2ebd0893945/pulsar-discovery-service/src/test/java/org/apache/pulsar/discovery/service/DiscoveryServiceTest.java#L251-L252
 could help with that?
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to