lhotari opened a new issue #9622: URL: https://github.com/apache/pulsar/issues/9622
CLITest is flaky. It fails when test retries are disabled. This problem can be reproduced by running CLITest in IntelliJ. **It seems that the same root cause is causing test failures in other integration tests**. Bookies die in the test after this kind of error message `failed to allocate 16777216 byte(s) of direct memory (used: 536870912, max: 536870912)`. To investigate the issue I set the environment variable [`TESTCONTAINERS_RYUK_DISABLED=true`](https://www.testcontainers.org/features/configuration/#disabling-ryuk) to disable TestContainers automatic container cleanup. I also locally temporarily added these lines to `PulsarCluster.stop` method: ``` public synchronized void stop() { + boolean leaveContainersRunning = Boolean.parseBoolean(System.getenv("TESTCONTAINERS_RYUK_DISABLED")); + if (leaveContainersRunning) { + log.warn("Pulsar cluster is left running since TESTCONTAINERS_RYUK_DISABLED=true."); + return; + } ``` After this, it's possible to use `docker exec -it CLITest-euiia-pulsar-bookie-0 bash` and get a shell to view the `/var/log/pulsar/bookie.log` file which is created by the [`tests/docker-images/latest-version-image/conf/bookie.conf`](https://github.com/apache/pulsar/blob/master/tests/docker-images/latest-version-image/conf/bookie.conf) config used to run bookies in the `apachepulsar/pulsar-test-latest-version` docker image. This file contained the following error message: ``` 08:26:53.194 [SyncThread-7-1] INFO org.apache.bookkeeper.bookie.EntryLogManagerBase - Creating a new entry log fi le because current active log channel has not initialized yet 08:26:53.195 [SyncThread-7-1] ERROR org.apache.bookkeeper.proto.BookieServer - Unable to allocate memory, exiting bookie io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 5368709 12, max: 536870912) ``` After restarting the bookie service with `supervisorctl start bookie`, the bookie starts again and I was able to check the `java` process command line: ``` /usr/local/openjdk-8/bin/java -cp /pulsar/conf:::/pulsar/lib/*: -Dlog4j.configurationFile=log4j2.yaml -Djute.maxbuffer=10485760 -Djava.net.preferIPv4Stack=true -Xmx128M -XX:MaxDirectMemorySize=512M -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCapacity=1024 -Dpulsar.log.appender=RoutingAppender -Dpulsar.log.dir=/pulsar/logs -Dpulsar.log.level=info -Dpulsar.log.root.level=info -Dpulsar.routing.appender.default=Console -Dlog4j2.is.webapp=false -Dpulsar.functions.process.container.log.dir=/pulsar/logs -Dpulsar.functions.java.instance.jar=/pulsar/instances/java-instance.jar -Dpulsar.functions.python.instance.file=/pulsar/instances/python-instance/python_instance_main.py -Dpulsar.functions.extra.depe ndencies.dir=/pulsar/instances/deps -Dpulsar.functions.instance.classpath=/pulsar/conf:::/pulsar/lib/*: -Dpulsar.log.file=bookkeeper.log org.apache.bookkeeper.server.Main --conf /pulsar/conf/bookkeeper.conf ``` I can see that `-XX:MaxDirectMemorySize=512M` is properly passed. I noticed that `PULSAR_GC` isn't used from [`tests/docker-images/latest-version-image/conf/bookie.conf`](https://github.com/apache/pulsar/blob/master/tests/docker-images/latest-version-image/conf/bookie.conf) so I created #9621 to fix that. However that's not the reason for the flakiness. In [`tests/docker-images/latest-version-image/conf/bookie.conf`](https://github.com/apache/pulsar/blob/cf63ae8480e6b03aca437b658cc10a935129a819/tests/docker-images/latest-version-image/conf/bookie.conf#L25) there is configuration to set `dbStorage_writeCacheMaxSizeMb="16",dbStorage_readAheadCacheMaxSizeMb="16"` . However this is a no-op since `apply-config-from-env.py` script is called in [`tests/docker-images/latest-version-image/scripts/run-bookie.sh`](https://github.com/apache/pulsar/blob/cf63ae8480e6b03aca437b658cc10a935129a819/tests/docker-images/latest-version-image/scripts/run-bookie.sh#L21-L22) . There is nothing that will put the environment variables defined in the supervisord config to `conf/bookkeeper.conf`. Therefore, the configuration for `dbStorage_*` should be directly in the `run-bookie.sh` script to fix the issue. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
