On Thu, Jun 4, 2020 at 6:16 PM Mark Thomas <ma...@apache.org> wrote: > On 04/06/2020 16:33, Martin Grigorov wrote: > > Hi team, > > > > I am profiling our application and I've found that Tomcat uses its own > > Synchronized Queue/Stack implementations which synchronize on instance > > level, i.e. their methods are 'synchronized'. >
Can you work off 10 (master) first ? > > ~8 years ago when I was profiling Tomcat and wrote those having > something targetted to Tomcat's specific use case made a measurable > improvement. > Yes, the SyncStack was the go to structure for all reuse. Not sure if it's the best. I already removed some of the usage in 10: - No more block poller and its pooling/recycling [I had to pull its removal out of 9 due to an unfortunate regression, this could be brought back now that it seems to work] - No recycling of trivial objects The main use should now be the recycled processors and the recycled channels (it would be bad to not reuse them since they may use direct buffers ...). > > I am more than prepared to believe that things have moved on since then > but I'd be interested in seeing the same numbers with the latest Java 8 > from Adopt OpenJDK rather than Java 14. > +1, firnally there could be some differences between JDK, like now they started to actually improve NIO 1 again. Rémy > > I'd also like to see overall throughput numbers and the GC frequency for > each test scenario. > > Mark > > > > > > > The javadoc for those two classes say that they are preferred because > > they are GC-friendlier than > > java.uti.concurrent.ConcurrentLinkedQueue. > > > > I use Async Profiler > > (https://github.com/jvm-profiling-tools/async-profiler) to profile CPU, > > memory, > > locks and few lower level types (from perf). > > Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such > reports: > > > > Started [lock] profiling > > > > --- Execution profile --- > > > > Total samples : 713834 > > > > > > > > Frame buffer usage : 0.0288% > > > > > > --- > > 1106165618177 ns (58.97%), 395609 samples > > > > [ 0] > > org.apache.tomcat.util.collections.SynchronizedStack > > > > [ 1] > > org.apache.tomcat.util.collections.SynchronizedStack.push > > > > [ 2] > > org.apache.tomcat.util.net.NioBlockingSelector.write > > > > [ 3] > > org.apache.tomcat.util.net.NioSelectorPool.write > > > > [ 4] > > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite > > > > [ 5] > > org.apache.tomcat.util.net.SocketWrapperBase.doWrite > > > > [ 6] > > org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking > > > > [ 7] > > org.apache.tomcat.util.net.SocketWrapperBase.flush > > > > [ 8] > > org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end > > [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end > > > > [10] org.apache.coyote.http11.Http11OutputBuffer.end > > [11] org.apache.coyote.http11.Http11Processor.finishResponse > > [12] org.apache.coyote.AbstractProcessor.action > > [13] org.apache.coyote.Response.action > > [14] org.apache.catalina.connector.OutputBuffer.close > > [15] org.apache.catalina.connector.Response.finishResponse > > [16] org.apache.catalina.connector.CoyoteAdapter.service > > [17] org.apache.coyote.http11.Http11Processor.service > > [18] org.apache.coyote.AbstractProcessorLight.process > > [19] org.apache.coyote.AbstractProtocol$ConnectionHandler.process > > [20] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun > > [21] org.apache.tomcat.util.net.SocketProcessorBase.run > > [22] java.util.concurrent.ThreadPoolExecutor.runWorker > > [23] java.util.concurrent.ThreadPoolExecutor$Worker.run > > [24] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run > > [25] java.lang.Thread.run > > > > --- 769519847169 ns (41.03%), 314384 samples > > [ 0] org.apache.tomcat.util.collections.SynchronizedStack > > [ 1] org.apache.tomcat.util.collections.SynchronizedStack.pop > > [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write > > [ 3] org.apache.tomcat.util.net.NioSelectorPool.write > > [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite > > [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite > > [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking > > [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush > > [ 8] org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end > > [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end > > [10] org.apache.coyote.http11.Http11OutputBuffer.end > > [11] org.apache.coyote.http11.Http11Processor.finishResponse > > .... > > > > > > I've made the following changes > > - > https://github.com/apache/tomcat/compare/9.0.x...martin-g:feature/lock-free-queue-and-stack > > and ran the test suite - all green! > > > > Now the output from the locks profiling looks like: > > > > > > --- 72354435197 ns (47.98%), 4008 samples > > [ 0] java.util.concurrent.locks.ReentrantLock$NonfairSync > > [ 1] jdk.internal.misc.Unsafe.park > > [ 2] java.util.concurrent.locks.LockSupport.park > > [ 3] java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire > > [ 4] > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly > > [ 5] java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly > > [ 6] java.util.concurrent.locks.ReentrantLock.lockInterruptibly > > [ 7] java.util.concurrent.LinkedBlockingQueue.poll > > [ 8] org.apache.tomcat.util.threads.TaskQueue.poll > > [ 9] org.apache.tomcat.util.threads.TaskQueue.poll > > [10] java.util.concurrent.ThreadPoolExecutor.getTask > > [11] java.util.concurrent.ThreadPoolExecutor.runWorker > > [12] java.util.concurrent.ThreadPoolExecutor$Worker.run > > [13] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run > > [14] java.lang.Thread.run > > > > --- 58413748017 ns (38.74%), 10179 samples > > [ 0] java.lang.Object > > [ 1] org.apache.catalina.core.StandardContext$ContextFilterMaps.asArray > > [ 2] org.apache.catalina.core.StandardContext.findFilterMaps > > [ 3] > org.apache.catalina.core.ApplicationFilterFactory.createFilterChain > > [ 4] org.apache.catalina.core.StandardWrapperValve.invoke > > [ 5] org.apache.catalina.core.StandardContextValve.invoke > > [ 6] org.apache.catalina.authenticator.AuthenticatorBase.invoke > > [ 7] org.apache.catalina.core.StandardHostValve.invoke > > [ 8] org.apache.catalina.valves.ErrorReportValve.invoke > > [ 9] org.apache.catalina.core.StandardEngineValve.invoke > > [10] org.apache.catalina.connector.CoyoteAdapter.service > > [11] org.apache.coyote.http11.Http11Processor.service > > [12] org.apache.coyote.AbstractProcessorLight.process > > [13] org.apache.coyote.AbstractProtocol$ConnectionHandler.process > > [14] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun > > [15] org.apache.tomcat.util.net.SocketProcessorBase.run > > [16] java.util.concurrent.ThreadPoolExecutor.runWorker > > [17] java.util.concurrent.ThreadPoolExecutor$Worker.run > > [18] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run > > [19] java.lang.Thread.run > > > > --- 5856040617 ns (3.88%), 11395 samples > > [ 0] org.apache.tomcat.util.collections.SynchronizedQueue > > [ 1] org.apache.tomcat.util.collections.SynchronizedQueue.offer > > [ 2] org.apache.tomcat.util.net.NioEndpoint$Poller.addEvent > > [ 3] org.apache.tomcat.util.net.NioEndpoint$Poller.add > > [ 4] > > org.apache.tomcat.util.net > .NioEndpoint$NioSocketWrapper.registerReadInterest > > [ 5] org.apache.coyote.AbstractProtocol$ConnectionHandler.process > > ... > > > > > > I monitor the GC both with -Xloggc:gc.log and with JMX and the patterns > > are the same, no new spikes. > > JDK 14 with G1GC. > > > > I load the application with wrk (https://github.com/wg/wrk/) and it > > makes 70/80K per second. > > > > I guess there is a reason to use these custom synchronized data > > structures but so far I don't see any drawbacks after moving to the > > lock-free impls. > > > > Regards, > > Martin > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > >