lhotari commented on issue #14436: URL: https://github.com/apache/pulsar/issues/14436#issuecomment-1064875510
Thanks for the analysis @congbobo184 . Really helpful. > As we can see, this problem is from bookie client and is a long-standing problem. So we don't block 2.10 and 2.9.2 release, should wait bookie fix and release, so we release broker first and then fix this issue in bookie client. This doesn't mean that we should accept the bug that is caused by thread safety issues. This is a severe issue and can lead to data inconsistency problems. As discussed privately, this problem seems to become more frequent with Netty 4.1.74.Final compare to Netty 4.1.68.Final version. Most likely this behavioral changes is caused by the new Netty Recycler that was introduced in Netty 4.1.71.Final. My assumption is that the new implementation is more efficient and brings thread safety issues to the surface. There's more info about the change in this comment: https://github.com/apache/pulsar/pull/13328#issuecomment-1019981615 I'll bring this severe issue up to discussion on the Apache Pulsar and Apache Bookkeeper developer mailing lists. The quick workaround is to disable Netty Recycler completely by setting the system property `-Dio.netty.recycler.maxCapacityPerThread=0`. There are early reports from @dave2wave in OMB testing that when running on JDK17 ZGC or Shenandoah GC, there's no negative performance impact in disabling Netty Recycler. This means that it will be useful to consider the removal of Netty Recycler from Bookkeeper and Pulsar code bases in the future. The current usage patterns contain thread safety issues and those will continue to hinder Pulsar and Bookkeeper reliability unless it's resolved completely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
