Hi team,

I am profiling our application and I've found that Tomcat uses its own
Synchronized Queue/Stack implementations which synchronize on instance
level, i.e. their methods are 'synchronized'.

The javadoc for those two classes say that they are preferred because they
are GC-friendlier than
java.uti.concurrent.ConcurrentLinkedQueue.

I use Async Profiler (https://github.com/jvm-profiling-tools/async-profiler)
to profile CPU, memory,
locks and few lower level types (from perf).
Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such reports:

Started [lock] profiling
                                                                   ---
Execution profile ---
                                                               Total
samples       : 713834


                                                        Frame buffer usage
 : 0.0288%


                                    --- 1106165618177 ns (58.97%), 395609
samples

                   [ 0]
org.apache.tomcat.util.collections.SynchronizedStack

                                                   [ 1]
org.apache.tomcat.util.collections.SynchronizedStack.push

                                                  [ 2]
org.apache.tomcat.util.net.NioBlockingSelector.write

                                                   [ 3]
org.apache.tomcat.util.net.NioSelectorPool.write

                                                   [ 4]
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite

                                                  [ 5]
org.apache.tomcat.util.net.SocketWrapperBase.doWrite

                                                   [ 6]
org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking

                                                   [ 7]
org.apache.tomcat.util.net.SocketWrapperBase.flush

                                                   [ 8]
org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
  [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end

  [10] org.apache.coyote.http11.Http11OutputBuffer.end
  [11] org.apache.coyote.http11.Http11Processor.finishResponse
  [12] org.apache.coyote.AbstractProcessor.action
  [13] org.apache.coyote.Response.action
  [14] org.apache.catalina.connector.OutputBuffer.close
  [15] org.apache.catalina.connector.Response.finishResponse
  [16] org.apache.catalina.connector.CoyoteAdapter.service
  [17] org.apache.coyote.http11.Http11Processor.service
  [18] org.apache.coyote.AbstractProcessorLight.process
  [19] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
  [20] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
  [21] org.apache.tomcat.util.net.SocketProcessorBase.run
  [22] java.util.concurrent.ThreadPoolExecutor.runWorker
  [23] java.util.concurrent.ThreadPoolExecutor$Worker.run
  [24] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
  [25] java.lang.Thread.run

--- 769519847169 ns (41.03%), 314384 samples
  [ 0] org.apache.tomcat.util.collections.SynchronizedStack
  [ 1] org.apache.tomcat.util.collections.SynchronizedStack.pop
  [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write
  [ 3] org.apache.tomcat.util.net.NioSelectorPool.write
  [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite
  [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite
  [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking
  [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush
  [ 8] org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
  [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end
  [10] org.apache.coyote.http11.Http11OutputBuffer.end
  [11] org.apache.coyote.http11.Http11Processor.finishResponse
   ....


I've made the following changes -
https://github.com/apache/tomcat/compare/9.0.x...martin-g:feature/lock-free-queue-and-stack
and ran the test suite - all green!

Now the output from the locks profiling looks like:


--- 72354435197 ns (47.98%), 4008 samples
  [ 0] java.util.concurrent.locks.ReentrantLock$NonfairSync
  [ 1] jdk.internal.misc.Unsafe.park
  [ 2] java.util.concurrent.locks.LockSupport.park
  [ 3] java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire
  [ 4]
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly
  [ 5] java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly
  [ 6] java.util.concurrent.locks.ReentrantLock.lockInterruptibly
  [ 7] java.util.concurrent.LinkedBlockingQueue.poll
  [ 8] org.apache.tomcat.util.threads.TaskQueue.poll
  [ 9] org.apache.tomcat.util.threads.TaskQueue.poll
  [10] java.util.concurrent.ThreadPoolExecutor.getTask
  [11] java.util.concurrent.ThreadPoolExecutor.runWorker
  [12] java.util.concurrent.ThreadPoolExecutor$Worker.run
  [13] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
  [14] java.lang.Thread.run

--- 58413748017 ns (38.74%), 10179 samples
  [ 0] java.lang.Object
  [ 1] org.apache.catalina.core.StandardContext$ContextFilterMaps.asArray
  [ 2] org.apache.catalina.core.StandardContext.findFilterMaps
  [ 3] org.apache.catalina.core.ApplicationFilterFactory.createFilterChain
  [ 4] org.apache.catalina.core.StandardWrapperValve.invoke
  [ 5] org.apache.catalina.core.StandardContextValve.invoke
  [ 6] org.apache.catalina.authenticator.AuthenticatorBase.invoke
  [ 7] org.apache.catalina.core.StandardHostValve.invoke
  [ 8] org.apache.catalina.valves.ErrorReportValve.invoke
  [ 9] org.apache.catalina.core.StandardEngineValve.invoke
  [10] org.apache.catalina.connector.CoyoteAdapter.service
  [11] org.apache.coyote.http11.Http11Processor.service
  [12] org.apache.coyote.AbstractProcessorLight.process
  [13] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
  [14] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
  [15] org.apache.tomcat.util.net.SocketProcessorBase.run
  [16] java.util.concurrent.ThreadPoolExecutor.runWorker
  [17] java.util.concurrent.ThreadPoolExecutor$Worker.run
  [18] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
  [19] java.lang.Thread.run

--- 5856040617 ns (3.88%), 11395 samples
  [ 0] org.apache.tomcat.util.collections.SynchronizedQueue
  [ 1] org.apache.tomcat.util.collections.SynchronizedQueue.offer
  [ 2] org.apache.tomcat.util.net.NioEndpoint$Poller.addEvent
  [ 3] org.apache.tomcat.util.net.NioEndpoint$Poller.add
  [ 4]
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.registerReadInterest
  [ 5] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
  ...


I monitor the GC both with -Xloggc:gc.log and with JMX and the patterns are
the same, no new spikes.
JDK 14 with G1GC.

I load the application with wrk (https://github.com/wg/wrk/) and it makes
70/80K per second.

I guess there is a reason to use these custom synchronized data structures
but so far I don't see any drawbacks after moving to the lock-free impls.

Regards,
Martin

Reply via email to