[ 
https://issues.apache.org/jira/browse/CAMEL-17536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claus Ibsen resolved CAMEL-17536.
---------------------------------
      Assignee: Claus Ibsen
    Resolution: Fixed

> ServicePool.doStop hangs during shutdown
> ----------------------------------------
>
>                 Key: CAMEL-17536
>                 URL: https://issues.apache.org/jira/browse/CAMEL-17536
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-core
>    Affects Versions: 3.14.0
>            Reporter: Krzysztof Jamróz
>            Assignee: Claus Ibsen
>            Priority: Major
>             Fix For: 3.14.2, 3.15.0
>
>         Attachments: linkedhashmap corrupted.png
>
>
> Camel sometimes hangs (100% usage of 1 CPU) during shutdown. This occurs 
> intermittently during integration testing. We use {{recipientList}} to 
> perform some dynamic routing and many messages can be processed in parallel 
> threads by {{recipientList}} and there are a few of them in different routes 
> (this will be important in a moment).
> Stack trace of stuck thread:
> {noformat}
> "SpringApplicationShutdownHook" #18 prio=5 os_prio=0 cpu=1784989.37ms 
> elapsed=1790.21s tid=0x00007f7a13520000 nid=0xa80 runnable  
> [0x00007f79a3af5000]
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.camel.support.cache.ServicePool.stop(ServicePool.java:193)
>     at 
> org.apache.camel.support.cache.ServicePool$$Lambda$2274/0x000000084108fc40.accept(Unknown
>  Source)
>     at 
> java.util.LinkedHashMap$LinkedValues.forEach([email protected]/LinkedHashMap.java:608)
>     at org.apache.camel.support.cache.ServicePool.doStop(ServicePool.java:181)
>     at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
>     - locked <0x00000000e3ece780> (a java.lang.Object)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
>     at 
> org.apache.camel.support.cache.DefaultProducerCache.doStop(DefaultProducerCache.java:399)
>     at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
>     - locked <0x00000000e3ece740> (a java.lang.Object)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147)
>     at org.apache.camel.processor.RecipientList.doStop(RecipientList.java:222)
>     at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
>     - locked <0x00000000e3ecdb00> (a java.lang.Object)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:165)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147)
>     at org.apache.camel.processor.Pipeline.doStop(Pipeline.java:226)
>     at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
>     - locked <0x00000000e3ecda48> (a java.lang.Object)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147)
>     at 
> org.apache.camel.impl.engine.DefaultChannel.doStop(DefaultChannel.java:134)
>     at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
>     - locked <0x00000000e3eccde0> (a java.lang.Object)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopAndShutdownServices(ServiceHelper.java:257)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopAndShutdownServices(ServiceHelper.java:215)
>     at 
> org.apache.camel.processor.errorhandler.RedeliveryErrorHandler.doShutdown(RedeliveryErrorHandler.java:1665)
>     at 
> org.apache.camel.support.ChildServiceSupport.shutdown(ChildServiceSupport.java:113)
>     - locked <0x00000000e3ecc1e8> (a java.lang.Object)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopAndShutdownService(ServiceHelper.java:233)
>     at 
> org.apache.camel.impl.engine.RouteService.stopChildServices(RouteService.java:409)
>     at org.apache.camel.impl.engine.RouteService.doStop(RouteService.java:260)
>     at 
> org.apache.camel.support.ChildServiceSupport.stop(ChildServiceSupport.java:86)
>     - locked <0x00000000e40873a0> (a java.lang.Object)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
>     at 
> org.apache.camel.support.service.ServiceHelper.stopAndShutdownService(ServiceHelper.java:227)
>     at 
> org.apache.camel.impl.engine.AbstractCamelContext.shutdownServices(AbstractCamelContext.java:3559)
>     at 
> org.apache.camel.impl.engine.AbstractCamelContext.shutdownServices(AbstractCamelContext.java:3584)
>     at 
> org.apache.camel.impl.engine.AbstractCamelContext.doStop(AbstractCamelContext.java:3366)
>     at 
> org.apache.camel.spring.boot.SpringBootCamelContext.doStop(SpringBootCamelContext.java:61)
>     - locked <0x00000000e221e658> (a 
> org.apache.camel.spring.boot.SpringBootCamelContext)
>     at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
>     - locked <0x00000000e221e8a8> (a java.lang.Object)
>     at 
> org.apache.camel.impl.engine.AbstractCamelContext.stop(AbstractCamelContext.java:2640)
>     at 
> org.apache.camel.spring.SpringCamelContext.stop(SpringCamelContext.java:128)
>     at 
> org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:247)
>     at 
> org.springframework.context.support.DefaultLifecycleProcessor.access$300(DefaultLifecycleProcessor.java:54)
>     at 
> org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:373)
>     at 
> org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:206)
>     at 
> org.springframework.context.support.DefaultLifecycleProcessor.onClose(DefaultLifecycleProcessor.java:129)
>     at 
> org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1067)
>     at 
> org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1021)
>     - locked <0x00000000e11e3868> (a java.lang.Object)
>     at 
> org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:137)
>     at 
> org.springframework.boot.SpringApplicationShutdownHook$$Lambda$2265/0x0000000841088440.accept(Unknown
>  Source)
>     at java.lang.Iterable.forEach([email protected]/Iterable.java:75)
>     at 
> org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:106)
>     at java.lang.Thread.run([email protected]/Thread.java:829){noformat}
> Deeper inspection of heap dump reveals that linked list in 
> {{ServicePool.cache}} is corrupted and has a loop, which causes {{foreach}} 
> to loop infinitely (see attached screenshot from heapdump analysis).
> Access to {{ServicePool.cache}} is not synchronized even though 
> {{LinkedHashMap}} is not thread-safe. There is at least one code path that 
> can add to the cache it concurrently (and indeed in this seems to happen in 
> our tests):
>  # org.apache.camel.processor.MulticastProcessor.process(Exchange, 
> AsyncCallback)
>  # 
> org.apache.camel.processor.RecipientListProcessor.createProcessorExchangePairs(Exchange)
>  # 
> org.apache.camel.processor.RecipientListProcessor.doCreateProcessorExchangePairs(Exchange,
>  Object, List<ProcessorExchangePair>, int)
>  # 
> org.apache.camel.support.cache.DefaultProducerCache.acquireProducer(Endpoint)
>  # org.apache.camel.support.cache.ServicePool.acquire(Endpoint)
>  # org.apache.camel.support.cache.ServicePool.cache
> There are also a few unsynchronized invocations of 
> {{DefaultProducerCache.acquireProducer}} from other processors.
> I am not sure if other usages of {{cache}} need to be synchronized, maybe 
> Camel lifecycle provides enough serialization.
>  
> The simpliest solution is to wrap {{cache}} in 
> {{Collections.synchronizedMap}} but this might have negative impact on 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to