malaskowski commented on issue #19307: URL: https://github.com/apache/pulsar/issues/19307#issuecomment-1628666136
Hi, I was able to reproduce this problem on the latest Pulsar 3.0.0 image and would like to re-open this issue. ## Search before asking - [x] I searched in the issues and found nothing similar. ## Version - [apachepulsar/pulsar:3.0.0](https://hub.docker.com/layers/apachepulsar/pulsar/3.0.0/images/sha256-43ba59feeef43ce54f00ca80af53c075f3d98caf6d150215c988f78ccecc89c0?context=explore) ## Minimal reproduce step 1. Run Apache Pulsar cluster with Helm Chart default values and the only change: `--set defaultPulsarImageTag=3.0.0`. 2. Create partitioned topic (3 partitions) and start producing messages (message size is ~30kB). ## What did you expect to see? Messages produced with success. Brokers do not restart. ## What did you see instead? After around 30 000 messages, Brokers keep restarting with the following error: ```log INFO 2023-07-10T08:38:13.117316491Z [resource.labels.containerName: pulsar-broker] # INFO 2023-07-10T08:38:13.117589437Z [resource.labels.containerName: pulsar-broker] # A fatal error has been detected by the Java Runtime Environment: INFO 2023-07-10T08:38:13.118142215Z [resource.labels.containerName: pulsar-broker] # INFO 2023-07-10T08:38:13.119489360Z [resource.labels.containerName: pulsar-broker] # SIGSEGV (0xb) at pc=0x00007f9aab88e0f3, pid=1, tid=202 INFO 2023-07-10T08:38:13.119805559Z [resource.labels.containerName: pulsar-broker] # INFO 2023-07-10T08:38:13.120234080Z [resource.labels.containerName: pulsar-broker] # JRE version: OpenJDK Runtime Environment Temurin-17.0.6+10 (17.0.6+10) (build 17.0.6+10) INFO 2023-07-10T08:38:13.120351288Z [resource.labels.containerName: pulsar-broker] # Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (17.0.6+10, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) INFO 2023-07-10T08:38:13.120477437Z [resource.labels.containerName: pulsar-broker] # Problematic frame: INFO 2023-07-10T08:38:13.165446675Z [resource.labels.containerName: pulsar-broker] # V [libjvm.so+0xacf0f3] PhaseIdealLoop::build_loop_late_post_work(Node*, bool)+0x153 INFO 2023-07-10T08:38:13.166381326Z [resource.labels.containerName: pulsar-broker] # INFO 2023-07-10T08:38:13.167145592Z [resource.labels.containerName: pulsar-broker] # Core dump will be written. Default location: /core.%e.1.%t INFO 2023-07-10T08:38:13.167357602Z [resource.labels.containerName: pulsar-broker] # INFO 2023-07-10T08:38:13.168488365Z [resource.labels.containerName: pulsar-broker] # An error report file with more information is saved as: INFO 2023-07-10T08:38:13.169308881Z [resource.labels.containerName: pulsar-broker] # /pulsar/hs_err_pid1.log INFO 2023-07-10T08:38:13.264952361Z [resource.labels.containerName: pulsar-broker] # INFO 2023-07-10T08:38:13.264992765Z [resource.labels.containerName: pulsar-broker] # Compiler replay data is saved as: INFO 2023-07-10T08:38:13.265529939Z [resource.labels.containerName: pulsar-broker] # /pulsar/replay_pid1.log INFO 2023-07-10T08:38:13.265759628Z [resource.labels.containerName: pulsar-broker] # INFO 2023-07-10T08:38:13.265899593Z [resource.labels.containerName: pulsar-broker] # If you would like to submit a bug report, please visit: INFO 2023-07-10T08:38:13.266143760Z [resource.labels.containerName: pulsar-broker] # https://github.com/adoptium/adoptium-support/issues INFO 2023-07-10T08:38:13.266641554Z [resource.labels.containerName: pulsar-broker] # INFO 2023-07-10T08:38:13.266960230Z [resource.labels.containerName: pulsar-broker] {} INFO 2023-07-10T08:38:13.267208487Z [resource.labels.containerName: pulsar-broker] [error occurred during error reporting (), id 0xb, SIGSEGV (0xb) at pc=0x00007f9aac126941] INFO 2023-07-10T08:38:13.267282513Z [resource.labels.containerName: pulsar-broker] {} ``` Important logs: - JDK version: `Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (17.0.6+10, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)` - Problematic frame: `V [libjvm.so+0xacf0f3] PhaseIdealLoop::build_loop_late_post_work(Node*, bool)+0x153` ## Anything else? Seems like exactly the bug https://bugs.openjdk.org/browse/JDK-8285835?attachmentViewMode=list which is fixed in JDK `17.0.7`, while the `apachepulsar/pulsar:3.0.0` runs on affected JVM version: - Pulsar 3.0.0 Dockerfile uses JDK is 17 temurin: https://github.com/apache/pulsar/blob/v3.0.0/docker/pulsar/Dockerfile#L69 Seems like releasing a new Docker image with newer JDK should fix the problem (I've noticed that [Ubuntu version was upgarded](https://github.com/apache/pulsar/commit/12e6dd558b6ca1a22c228c8567f466d3d6aef024)). Probably building the image right now should download the latest `temurin-17-jdk`. Would that be possible to release bugfix image with never JDK? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
