Krzysztof Jamróz created CAMEL-17536:
----------------------------------------

             Summary: ServicePool.doStop hangs during shutdown
                 Key: CAMEL-17536
                 URL: https://issues.apache.org/jira/browse/CAMEL-17536
             Project: Camel
          Issue Type: Bug
    Affects Versions: 3.14.0
            Reporter: Krzysztof Jamróz
         Attachments: linkedhashmap corrupted.png

Camel sometimes hangs (100% usage of 1 CPU) during shutdown. This occurs 
intermittently during integration testing. We use {{recipientList }}to perform 
some dynamic routing and many messages can be processed in parallel threads by 
{{recipientList}} and there are a few of them in different routes (this will be 
important in a moment).

Stack trace of stuck thread:
{noformat}
"SpringApplicationShutdownHook" #18 prio=5 os_prio=0 cpu=1784989.37ms 
elapsed=1790.21s tid=0x00007f7a13520000 nid=0xa80 runnable  [0x00007f79a3af5000]
   java.lang.Thread.State: RUNNABLE
    at org.apache.camel.support.cache.ServicePool.stop(ServicePool.java:193)
    at 
org.apache.camel.support.cache.ServicePool$$Lambda$2274/0x000000084108fc40.accept(Unknown
 Source)
    at 
java.util.LinkedHashMap$LinkedValues.forEach([email protected]/LinkedHashMap.java:608)
    at org.apache.camel.support.cache.ServicePool.doStop(ServicePool.java:181)
    at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
    - locked <0x00000000e3ece780> (a java.lang.Object)
    at 
org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
    at 
org.apache.camel.support.cache.DefaultProducerCache.doStop(DefaultProducerCache.java:399)
    at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
    - locked <0x00000000e3ece740> (a java.lang.Object)
    at 
org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
    at 
org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147)
    at org.apache.camel.processor.RecipientList.doStop(RecipientList.java:222)
    at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
    - locked <0x00000000e3ecdb00> (a java.lang.Object)
    at 
org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
    at 
org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:165)
    at 
org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147)
    at org.apache.camel.processor.Pipeline.doStop(Pipeline.java:226)
    at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
    - locked <0x00000000e3ecda48> (a java.lang.Object)
    at 
org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
    at 
org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147)
    at 
org.apache.camel.impl.engine.DefaultChannel.doStop(DefaultChannel.java:134)
    at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
    - locked <0x00000000e3eccde0> (a java.lang.Object)
    at 
org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
    at 
org.apache.camel.support.service.ServiceHelper.stopAndShutdownServices(ServiceHelper.java:257)
    at 
org.apache.camel.support.service.ServiceHelper.stopAndShutdownServices(ServiceHelper.java:215)
    at 
org.apache.camel.processor.errorhandler.RedeliveryErrorHandler.doShutdown(RedeliveryErrorHandler.java:1665)
    at 
org.apache.camel.support.ChildServiceSupport.shutdown(ChildServiceSupport.java:113)
    - locked <0x00000000e3ecc1e8> (a java.lang.Object)
    at 
org.apache.camel.support.service.ServiceHelper.stopAndShutdownService(ServiceHelper.java:233)
    at 
org.apache.camel.impl.engine.RouteService.stopChildServices(RouteService.java:409)
    at org.apache.camel.impl.engine.RouteService.doStop(RouteService.java:260)
    at 
org.apache.camel.support.ChildServiceSupport.stop(ChildServiceSupport.java:86)
    - locked <0x00000000e40873a0> (a java.lang.Object)
    at 
org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
    at 
org.apache.camel.support.service.ServiceHelper.stopAndShutdownService(ServiceHelper.java:227)
    at 
org.apache.camel.impl.engine.AbstractCamelContext.shutdownServices(AbstractCamelContext.java:3559)
    at 
org.apache.camel.impl.engine.AbstractCamelContext.shutdownServices(AbstractCamelContext.java:3584)
    at 
org.apache.camel.impl.engine.AbstractCamelContext.doStop(AbstractCamelContext.java:3366)
    at 
org.apache.camel.spring.boot.SpringBootCamelContext.doStop(SpringBootCamelContext.java:61)
    - locked <0x00000000e221e658> (a 
org.apache.camel.spring.boot.SpringBootCamelContext)
    at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
    - locked <0x00000000e221e8a8> (a java.lang.Object)
    at 
org.apache.camel.impl.engine.AbstractCamelContext.stop(AbstractCamelContext.java:2640)
    at 
org.apache.camel.spring.SpringCamelContext.stop(SpringCamelContext.java:128)
    at 
org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:247)
    at 
org.springframework.context.support.DefaultLifecycleProcessor.access$300(DefaultLifecycleProcessor.java:54)
    at 
org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:373)
    at 
org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:206)
    at 
org.springframework.context.support.DefaultLifecycleProcessor.onClose(DefaultLifecycleProcessor.java:129)
    at 
org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1067)
    at 
org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1021)
    - locked <0x00000000e11e3868> (a java.lang.Object)
    at 
org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:137)
    at 
org.springframework.boot.SpringApplicationShutdownHook$$Lambda$2265/0x0000000841088440.accept(Unknown
 Source)
    at java.lang.Iterable.forEach([email protected]/Iterable.java:75)
    at 
org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:106)
    at java.lang.Thread.run([email protected]/Thread.java:829){noformat}
Deeper inspection of heap dump reveals that linked list in 
{{ServicePool.cache}} is corrupted and has a loop, which causes {{foreach}} to 
loop infinitely (see attached screenshot from heapdump analysis).

Access to {{ServicePool.cache}} is not synchronized even though 
{{LinkedHashMap}} is not thread-safe. There is at least one code path that can 
add to the cache it concurrently (and indeed in this seems to happen in our 
tests):
 # org.apache.camel.processor.MulticastProcessor.process(Exchange, 
AsyncCallback)
 # 
org.apache.camel.processor.RecipientListProcessor.createProcessorExchangePairs(Exchange)
 # 
org.apache.camel.processor.RecipientListProcessor.doCreateProcessorExchangePairs(Exchange,
 Object, List<ProcessorExchangePair>, int)
 # org.apache.camel.support.cache.DefaultProducerCache.acquireProducer(Endpoint)
 # org.apache.camel.support.cache.ServicePool.acquire(Endpoint)
 # org.apache.camel.support.cache.ServicePool.cache

There are also a few unsynchronized invocations of 
{{DefaultProducerCache.acquireProducer}} from other processors.

I am not sure if other usages of {{cache}} need to be synchronized, maybe Camel 
lifecycle provides enough serialization.
 
The simpliest solution is to wrap {{cache}} in {{Collections.synchronizedMap}} 
but this might have negative impact on performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to