[
https://issues.apache.org/jira/browse/CAMEL-17536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Claus Ibsen resolved CAMEL-17536.
---------------------------------
Assignee: Claus Ibsen
Resolution: Fixed
> ServicePool.doStop hangs during shutdown
> ----------------------------------------
>
> Key: CAMEL-17536
> URL: https://issues.apache.org/jira/browse/CAMEL-17536
> Project: Camel
> Issue Type: Bug
> Components: camel-core
> Affects Versions: 3.14.0
> Reporter: Krzysztof Jamróz
> Assignee: Claus Ibsen
> Priority: Major
> Fix For: 3.14.2, 3.15.0
>
> Attachments: linkedhashmap corrupted.png
>
>
> Camel sometimes hangs (100% usage of 1 CPU) during shutdown. This occurs
> intermittently during integration testing. We use {{recipientList}} to
> perform some dynamic routing and many messages can be processed in parallel
> threads by {{recipientList}} and there are a few of them in different routes
> (this will be important in a moment).
> Stack trace of stuck thread:
> {noformat}
> "SpringApplicationShutdownHook" #18 prio=5 os_prio=0 cpu=1784989.37ms
> elapsed=1790.21s tid=0x00007f7a13520000 nid=0xa80 runnable
> [0x00007f79a3af5000]
> java.lang.Thread.State: RUNNABLE
> at org.apache.camel.support.cache.ServicePool.stop(ServicePool.java:193)
> at
> org.apache.camel.support.cache.ServicePool$$Lambda$2274/0x000000084108fc40.accept(Unknown
> Source)
> at
> java.util.LinkedHashMap$LinkedValues.forEach([email protected]/LinkedHashMap.java:608)
> at org.apache.camel.support.cache.ServicePool.doStop(ServicePool.java:181)
> at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
> - locked <0x00000000e3ece780> (a java.lang.Object)
> at
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
> at
> org.apache.camel.support.cache.DefaultProducerCache.doStop(DefaultProducerCache.java:399)
> at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
> - locked <0x00000000e3ece740> (a java.lang.Object)
> at
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
> at
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147)
> at org.apache.camel.processor.RecipientList.doStop(RecipientList.java:222)
> at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
> - locked <0x00000000e3ecdb00> (a java.lang.Object)
> at
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
> at
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:165)
> at
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147)
> at org.apache.camel.processor.Pipeline.doStop(Pipeline.java:226)
> at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
> - locked <0x00000000e3ecda48> (a java.lang.Object)
> at
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
> at
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147)
> at
> org.apache.camel.impl.engine.DefaultChannel.doStop(DefaultChannel.java:134)
> at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
> - locked <0x00000000e3eccde0> (a java.lang.Object)
> at
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
> at
> org.apache.camel.support.service.ServiceHelper.stopAndShutdownServices(ServiceHelper.java:257)
> at
> org.apache.camel.support.service.ServiceHelper.stopAndShutdownServices(ServiceHelper.java:215)
> at
> org.apache.camel.processor.errorhandler.RedeliveryErrorHandler.doShutdown(RedeliveryErrorHandler.java:1665)
> at
> org.apache.camel.support.ChildServiceSupport.shutdown(ChildServiceSupport.java:113)
> - locked <0x00000000e3ecc1e8> (a java.lang.Object)
> at
> org.apache.camel.support.service.ServiceHelper.stopAndShutdownService(ServiceHelper.java:233)
> at
> org.apache.camel.impl.engine.RouteService.stopChildServices(RouteService.java:409)
> at org.apache.camel.impl.engine.RouteService.doStop(RouteService.java:260)
> at
> org.apache.camel.support.ChildServiceSupport.stop(ChildServiceSupport.java:86)
> - locked <0x00000000e40873a0> (a java.lang.Object)
> at
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
> at
> org.apache.camel.support.service.ServiceHelper.stopAndShutdownService(ServiceHelper.java:227)
> at
> org.apache.camel.impl.engine.AbstractCamelContext.shutdownServices(AbstractCamelContext.java:3559)
> at
> org.apache.camel.impl.engine.AbstractCamelContext.shutdownServices(AbstractCamelContext.java:3584)
> at
> org.apache.camel.impl.engine.AbstractCamelContext.doStop(AbstractCamelContext.java:3366)
> at
> org.apache.camel.spring.boot.SpringBootCamelContext.doStop(SpringBootCamelContext.java:61)
> - locked <0x00000000e221e658> (a
> org.apache.camel.spring.boot.SpringBootCamelContext)
> at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
> - locked <0x00000000e221e8a8> (a java.lang.Object)
> at
> org.apache.camel.impl.engine.AbstractCamelContext.stop(AbstractCamelContext.java:2640)
> at
> org.apache.camel.spring.SpringCamelContext.stop(SpringCamelContext.java:128)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:247)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.access$300(DefaultLifecycleProcessor.java:54)
> at
> org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:373)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:206)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.onClose(DefaultLifecycleProcessor.java:129)
> at
> org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1067)
> at
> org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1021)
> - locked <0x00000000e11e3868> (a java.lang.Object)
> at
> org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:137)
> at
> org.springframework.boot.SpringApplicationShutdownHook$$Lambda$2265/0x0000000841088440.accept(Unknown
> Source)
> at java.lang.Iterable.forEach([email protected]/Iterable.java:75)
> at
> org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:106)
> at java.lang.Thread.run([email protected]/Thread.java:829){noformat}
> Deeper inspection of heap dump reveals that linked list in
> {{ServicePool.cache}} is corrupted and has a loop, which causes {{foreach}}
> to loop infinitely (see attached screenshot from heapdump analysis).
> Access to {{ServicePool.cache}} is not synchronized even though
> {{LinkedHashMap}} is not thread-safe. There is at least one code path that
> can add to the cache it concurrently (and indeed in this seems to happen in
> our tests):
> # org.apache.camel.processor.MulticastProcessor.process(Exchange,
> AsyncCallback)
> #
> org.apache.camel.processor.RecipientListProcessor.createProcessorExchangePairs(Exchange)
> #
> org.apache.camel.processor.RecipientListProcessor.doCreateProcessorExchangePairs(Exchange,
> Object, List<ProcessorExchangePair>, int)
> #
> org.apache.camel.support.cache.DefaultProducerCache.acquireProducer(Endpoint)
> # org.apache.camel.support.cache.ServicePool.acquire(Endpoint)
> # org.apache.camel.support.cache.ServicePool.cache
> There are also a few unsynchronized invocations of
> {{DefaultProducerCache.acquireProducer}} from other processors.
> I am not sure if other usages of {{cache}} need to be synchronized, maybe
> Camel lifecycle provides enough serialization.
>
> The simpliest solution is to wrap {{cache}} in
> {{Collections.synchronizedMap}} but this might have negative impact on
> performance.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)