[ 
https://issues.apache.org/jira/browse/CAMEL-11750?focusedWorklogId=386926&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386926
 ]

ASF GitHub Bot logged work on CAMEL-11750:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Feb/20 22:30
            Start Date: 13/Feb/20 22:30
    Worklog Time Spent: 10m 
      Work Description: Magua commented on pull request #3576: CAMEL-11750 
Cherry picked into 2.16 -branch.
URL: https://github.com/apache/camel/pull/3576
 
 
   This is cherry picked from fix for CAMEL-11750 as 
[shown](https://github.com/leofromgroza/camel-long-term-route) to be a bug in 
2.18 branch and later.
   
   I found this also exists in 2.16 branch when running with jre 8 or later.
   
   I recreated the bug 
[here](https://github.com/Magua/camel/tree/recreate-CAMEL-11750-java8/examples/camel-example-spring-boot).
 Tests show that on jre 8 or later this maxes out one thread (core) while 
waiting for aggregation to finish.
   
   I'm happy to provide pr for 2.17 branch as well if this gets approved.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 386926)
    Time Spent: 0.5h  (was: 20m)

> Camel route with multicast (parallel) generate huge CPU load
> ------------------------------------------------------------
>
>                 Key: CAMEL-11750
>                 URL: https://issues.apache.org/jira/browse/CAMEL-11750
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-core
>    Affects Versions: 2.18.4, 2.19.2
>            Reporter: Nazar Vishka
>            Assignee: Claus Ibsen
>            Priority: Critical
>             Fix For: 2.18.5, 2.19.3, 2.20.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I've created simple [Spring Camel Route 
> application|https://github.com/leofromgroza/camel-long-term-route] that can 
> be used for issue reproducing. Here we have very simple route:
> {code}
>     public void configure() throws Exception {
>         from("direct:start").routeId("start")
>                 .multicast().parallelProcessing()
>                 .to("direct:very-long-task", "direct:long-task")
>                 .end();
>         from("direct:long-task").routeId("long-task")
>                 .log("Started long-task")
>                 .process(exchange -> Thread.sleep(5000))
>                 .log("Finished long-task")
>                 .end();
>         from("direct:very-long-task").routeId("very-long-task")
>                 .log("Started very-long-task")
>                 .process(exchange -> Thread.sleep(35000))
>                 .log("Finished very-long-task")
>                 .end();
>     }{code}
> From our main route 'start' we are starting in parallel two sub-routes: 
> 'long-task' and 'very-long-task'. They are just doing something for some 
> period of time and do not generate any load to the system.
> But I found that when one task finished earlier than other one, route start 
> to make a huge CPU load. Here you can see a CPU usage during executiong of 
> Camel route that was mentioned earlier (after finishing of 'long-task' usage 
> of CPU uncreased from 0 to 12.5%):
> !https://content.screencast.com/users/NazarV/folders/Jing/media/830268f0-d184-4c57-adb1-b782ea63fa6d/2017-09-06_1241.png!
> Screenshot was made when I was running route on my Windows PC with 4 physical 
> CPU cores + 4 HT. On Unix systems we found that after end of 'long-task' it 
> used 100% of one core till the end of work.
> One more interesting thing that i've found is that the main load on the 
> system was generated by the thread MulticastProcessor-AggregateTask that was 
> spending a lot of time in the method 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.ConditionObject#awaitNanos
>  that was called from java.util.concurrent.DelayQueue#poll(long, 
> java.util.concurrent.TimeUnit):
> {code}"Camel (camel-1) thread #2 - MulticastProcessor-AggregateTask" #29 
> daemon prio=5 os_prio=0 tid=0x00000000215e3000 nid=0x7a0 runnable 
> [0x0000000022eaf000]
>    java.lang.Thread.State: RUNNABLE
>       at java.lang.Thread.isInterrupted(Native Method)
>       at java.lang.Thread.interrupted(Thread.java:944)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.checkInterruptWhileWaiting(AbstractQueuedSynchronizer.java:2002)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2079)
>       at java.util.concurrent.DelayQueue.poll(DelayQueue.java:273)
>       at 
> org.apache.camel.util.concurrent.SubmitOrderedCompletionService.poll(SubmitOrderedCompletionService.java:127)
>       at 
> org.apache.camel.processor.MulticastProcessor$AggregateOnTheFlyTask.aggregateOnTheFly(MulticastProcessor.java:463)
>       at 
> org.apache.camel.processor.MulticastProcessor$AggregateOnTheFlyTask.run(MulticastProcessor.java:418)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745){code}
> In DelayQueue#poll(long, java.util.concurrent.TimeUnit) we have a piece of 
> code:
> {code}E first = q.peek();
> if (first == null) {
>  ...
> } else {
>   long delay = first.getDelay(NANOSECONDS);
>     ...
>   long timeLeft =  available.awaitNanos(delay) {code}
> During debugging I found that E first is object of class 
> [org.apache.camel.util.concurrent.SubmitOrderedCompletionService.SubmitOrderFutureTask|https://github.com/apache/camel/blob/camel-2.19.2/camel-core/src/main/java/org/apache/camel/util/concurrent/SubmitOrderedCompletionService.java]
>  and it's very interesting [getDelay(TimeUnit) 
> method|https://github.com/apache/camel/blob/camel-2.19.2/camel-core/src/main/java/org/apache/camel/util/concurrent/SubmitOrderedCompletionService.java#L64]
>  always return 1. I thing that is a source of current issue:
> {code} public long getDelay(TimeUnit unit) {
>             // if the answer is 0 then this task is ready to be taken
>             return id - index.get();
>         }{code}
> So when delay is 1, AggregateTask thread is not sleeping at all and generates 
> a huge CPU load all the time when it is waiting for the end of the last task.
> I think that [getDelay(TimeUnit) 
> method|https://github.com/apache/camel/blob/camel-2.19.2/camel-core/src/main/java/org/apache/camel/util/concurrent/SubmitOrderedCompletionService.java#L64]
>  should be somehow re-implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to