Hi team, I am profiling our application and I've found that Tomcat uses its own Synchronized Queue/Stack implementations which synchronize on instance level, i.e. their methods are 'synchronized'.
The javadoc for those two classes say that they are preferred because they are GC-friendlier than java.uti.concurrent.ConcurrentLinkedQueue. I use Async Profiler (https://github.com/jvm-profiling-tools/async-profiler) to profile CPU, memory, locks and few lower level types (from perf). Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such reports: Started [lock] profiling --- Execution profile --- Total samples : 713834 Frame buffer usage : 0.0288% --- 1106165618177 ns (58.97%), 395609 samples [ 0] org.apache.tomcat.util.collections.SynchronizedStack [ 1] org.apache.tomcat.util.collections.SynchronizedStack.push [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write [ 3] org.apache.tomcat.util.net.NioSelectorPool.write [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush [ 8] org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end [10] org.apache.coyote.http11.Http11OutputBuffer.end [11] org.apache.coyote.http11.Http11Processor.finishResponse [12] org.apache.coyote.AbstractProcessor.action [13] org.apache.coyote.Response.action [14] org.apache.catalina.connector.OutputBuffer.close [15] org.apache.catalina.connector.Response.finishResponse [16] org.apache.catalina.connector.CoyoteAdapter.service [17] org.apache.coyote.http11.Http11Processor.service [18] org.apache.coyote.AbstractProcessorLight.process [19] org.apache.coyote.AbstractProtocol$ConnectionHandler.process [20] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun [21] org.apache.tomcat.util.net.SocketProcessorBase.run [22] java.util.concurrent.ThreadPoolExecutor.runWorker [23] java.util.concurrent.ThreadPoolExecutor$Worker.run [24] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run [25] java.lang.Thread.run --- 769519847169 ns (41.03%), 314384 samples [ 0] org.apache.tomcat.util.collections.SynchronizedStack [ 1] org.apache.tomcat.util.collections.SynchronizedStack.pop [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write [ 3] org.apache.tomcat.util.net.NioSelectorPool.write [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush [ 8] org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end [10] org.apache.coyote.http11.Http11OutputBuffer.end [11] org.apache.coyote.http11.Http11Processor.finishResponse .... I've made the following changes - https://github.com/apache/tomcat/compare/9.0.x...martin-g:feature/lock-free-queue-and-stack and ran the test suite - all green! Now the output from the locks profiling looks like: --- 72354435197 ns (47.98%), 4008 samples [ 0] java.util.concurrent.locks.ReentrantLock$NonfairSync [ 1] jdk.internal.misc.Unsafe.park [ 2] java.util.concurrent.locks.LockSupport.park [ 3] java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire [ 4] java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly [ 5] java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly [ 6] java.util.concurrent.locks.ReentrantLock.lockInterruptibly [ 7] java.util.concurrent.LinkedBlockingQueue.poll [ 8] org.apache.tomcat.util.threads.TaskQueue.poll [ 9] org.apache.tomcat.util.threads.TaskQueue.poll [10] java.util.concurrent.ThreadPoolExecutor.getTask [11] java.util.concurrent.ThreadPoolExecutor.runWorker [12] java.util.concurrent.ThreadPoolExecutor$Worker.run [13] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run [14] java.lang.Thread.run --- 58413748017 ns (38.74%), 10179 samples [ 0] java.lang.Object [ 1] org.apache.catalina.core.StandardContext$ContextFilterMaps.asArray [ 2] org.apache.catalina.core.StandardContext.findFilterMaps [ 3] org.apache.catalina.core.ApplicationFilterFactory.createFilterChain [ 4] org.apache.catalina.core.StandardWrapperValve.invoke [ 5] org.apache.catalina.core.StandardContextValve.invoke [ 6] org.apache.catalina.authenticator.AuthenticatorBase.invoke [ 7] org.apache.catalina.core.StandardHostValve.invoke [ 8] org.apache.catalina.valves.ErrorReportValve.invoke [ 9] org.apache.catalina.core.StandardEngineValve.invoke [10] org.apache.catalina.connector.CoyoteAdapter.service [11] org.apache.coyote.http11.Http11Processor.service [12] org.apache.coyote.AbstractProcessorLight.process [13] org.apache.coyote.AbstractProtocol$ConnectionHandler.process [14] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun [15] org.apache.tomcat.util.net.SocketProcessorBase.run [16] java.util.concurrent.ThreadPoolExecutor.runWorker [17] java.util.concurrent.ThreadPoolExecutor$Worker.run [18] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run [19] java.lang.Thread.run --- 5856040617 ns (3.88%), 11395 samples [ 0] org.apache.tomcat.util.collections.SynchronizedQueue [ 1] org.apache.tomcat.util.collections.SynchronizedQueue.offer [ 2] org.apache.tomcat.util.net.NioEndpoint$Poller.addEvent [ 3] org.apache.tomcat.util.net.NioEndpoint$Poller.add [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.registerReadInterest [ 5] org.apache.coyote.AbstractProtocol$ConnectionHandler.process ... I monitor the GC both with -Xloggc:gc.log and with JMX and the patterns are the same, no new spikes. JDK 14 with G1GC. I load the application with wrk (https://github.com/wg/wrk/) and it makes 70/80K per second. I guess there is a reason to use these custom synchronized data structures but so far I don't see any drawbacks after moving to the lock-free impls. Regards, Martin