lhotari opened a new issue #9622:
URL: https://github.com/apache/pulsar/issues/9622


   CLITest is flaky. It fails when test retries are disabled. This problem can 
be reproduced by running CLITest in IntelliJ.
   **It seems that the same root cause is causing test failures in other 
integration tests**.
   
   Bookies die in the test after this kind of error message `failed to allocate 
16777216 byte(s) of direct memory (used: 536870912, max: 536870912)`.
   
   To investigate the issue I set the environment variable 
[`TESTCONTAINERS_RYUK_DISABLED=true`](https://www.testcontainers.org/features/configuration/#disabling-ryuk)
 to disable TestContainers automatic container cleanup. I also locally 
temporarily added these lines to `PulsarCluster.stop` method:
   ```
        public synchronized void stop() {
   +        boolean leaveContainersRunning = 
Boolean.parseBoolean(System.getenv("TESTCONTAINERS_RYUK_DISABLED"));
   +        if (leaveContainersRunning) {
   +            log.warn("Pulsar cluster is left running since 
TESTCONTAINERS_RYUK_DISABLED=true.");
   +            return;
   +        }
   ```
   After this, it's possible to use `docker exec -it 
CLITest-euiia-pulsar-bookie-0 bash` and get a shell to view the 
`/var/log/pulsar/bookie.log` file which is created by the 
[`tests/docker-images/latest-version-image/conf/bookie.conf`](https://github.com/apache/pulsar/blob/master/tests/docker-images/latest-version-image/conf/bookie.conf)
 config used to run bookies in the `apachepulsar/pulsar-test-latest-version` 
docker image. This file contained the following error message:
   
   ```
   08:26:53.194 [SyncThread-7-1] INFO  
org.apache.bookkeeper.bookie.EntryLogManagerBase - Creating a new entry log fi
   le because current active log channel has not initialized yet
   08:26:53.195 [SyncThread-7-1] ERROR org.apache.bookkeeper.proto.BookieServer 
- Unable to allocate memory, exiting 
   bookie
   io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 
byte(s) of direct memory (used: 5368709
   12, max: 536870912)
   ```
   
   After restarting the bookie service with `supervisorctl start bookie`, the 
bookie starts again and I was able to check the `java` process command line:
   ```
   /usr/local/openjdk-8/bin/java -cp /pulsar/conf:::/pulsar/lib/*: 
-Dlog4j.configurationFile=log4j2.yaml -Djute.maxbuffer=10485760 
-Djava.net.preferIPv4Stack=true -Xmx128M -XX:MaxDirectMemorySize=512M 
-XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled 
-XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 
-XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC 
-XX:-ResizePLAB -Dio.netty.leakDetectionLevel=disabled 
-Dio.netty.recycler.maxCapacity.default=1000 
-Dio.netty.recycler.linkCapacity=1024 -Dpulsar.log.appender=RoutingAppender 
-Dpulsar.log.dir=/pulsar/logs -Dpulsar.log.level=info 
-Dpulsar.log.root.level=info -Dpulsar.routing.appender.default=Console 
-Dlog4j2.is.webapp=false 
-Dpulsar.functions.process.container.log.dir=/pulsar/logs 
-Dpulsar.functions.java.instance.jar=/pulsar/instances/java-instance.jar 
-Dpulsar.functions.python.instance.file=/pulsar/instances/python-instance/python_instance_main.py
 -Dpulsar.functions.extra.depe
 ndencies.dir=/pulsar/instances/deps 
-Dpulsar.functions.instance.classpath=/pulsar/conf:::/pulsar/lib/*: 
-Dpulsar.log.file=bookkeeper.log org.apache.bookkeeper.server.Main --conf 
/pulsar/conf/bookkeeper.conf
   ```
   
   I can see that `-XX:MaxDirectMemorySize=512M` is properly passed. I noticed 
that `PULSAR_GC` isn't used from 
[`tests/docker-images/latest-version-image/conf/bookie.conf`](https://github.com/apache/pulsar/blob/master/tests/docker-images/latest-version-image/conf/bookie.conf)
 so I created #9621 to fix that. However that's not the reason for the 
flakiness.
   
   In 
[`tests/docker-images/latest-version-image/conf/bookie.conf`](https://github.com/apache/pulsar/blob/cf63ae8480e6b03aca437b658cc10a935129a819/tests/docker-images/latest-version-image/conf/bookie.conf#L25)
 there is configuration to set 
`dbStorage_writeCacheMaxSizeMb="16",dbStorage_readAheadCacheMaxSizeMb="16"` . 
However this is a no-op since `apply-config-from-env.py` script is called in 
   
[`tests/docker-images/latest-version-image/scripts/run-bookie.sh`](https://github.com/apache/pulsar/blob/cf63ae8480e6b03aca437b658cc10a935129a819/tests/docker-images/latest-version-image/scripts/run-bookie.sh#L21-L22)
 . There is nothing that will put the environment variables defined in the 
supervisord config to `conf/bookkeeper.conf`. Therefore, the configuration for 
`dbStorage_*` should be directly in the `run-bookie.sh` script to fix the issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to