[ 
https://issues.apache.org/jira/browse/AMBARI-20646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367030#comment-16367030
 ] 

Hudson commented on AMBARI-20646:
---------------------------------

FAILURE: Integrated in Jenkins build Ambari-trunk-Commit #8744 (See 
[https://builds.apache.org/job/Ambari-trunk-Commit/8744/])
AMBARI-20646 - Large Long Running Requests Can Slow Down the (aonishuk: 
[https://gitbox.apache.org/repos/asf?p=ambari.git&a=commit&h=aba473e84a5a24d12b29a2bf9e858019c023f6fd])
* (edit) 
ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionDBAccessorImpl.java
* (edit) 
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java
* (edit) 
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
* (edit) 
ambari-server/src/test/java/org/apache/ambari/server/serveraction/ServerActionExecutorTest.java
* (edit) 
ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java
* (edit) 
ambari-server/src/test/java/org/apache/ambari/server/state/services/RetryUpgradeActionServiceTest.java
* (edit) 
ambari-server/src/main/java/org/apache/ambari/server/serveraction/ServerActionExecutor.java
* (edit) 
ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java
* (edit) 
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
* (edit) 
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java
* (edit) 
ambari-server/src/test/java/org/apache/ambari/server/orm/dao/RequestDAOTest.java


> Large Long Running Requests Can Slow Down the ActionScheduler
> -------------------------------------------------------------
>
>                 Key: AMBARI-20646
>                 URL: https://issues.apache.org/jira/browse/AMBARI-20646
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.4.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.5.1
>
>         Attachments: AMBARI-20646.patch
>
>
> When creating a massive request (a rolling upgrade on a cluster with 1000 
> nodes), the size of the request seems to slow down the {{ActionScheduler}}. 
> Each command was taking between 1 to 2 minutes to run (even server-side 
> tasks). 
> The cause of this can be seen in the following two stack traces:
> {code:title=ActionSchedulerImpl}
>       at 
> org.apache.ambari.server.orm.dao.DaoUtils.selectList(DaoUtils.java:60)
>       at 
> org.apache.ambari.server.orm.dao.HostRoleCommandDAO.findByPKs(HostRoleCommandDAO.java:293)
>       at 
> org.apache.ambari.server.orm.dao.HostRoleCommandDAO$$EnhancerByGuice$$21789cd1.CGLIB$findByPKs$7(<generated>)
>       at 
> org.apache.ambari.server.orm.dao.HostRoleCommandDAO$$EnhancerByGuice$$21789cd1$$FastClassByGuice$$aa975e7f.invoke(<generated>)
>       at 
> com.google.inject.internal.cglib.proxy.$MethodProxy.invokeSuper(MethodProxy.java:228)
>       at 
> com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
>       at 
> org.apache.ambari.server.orm.AmbariLocalSessionInterceptor.invoke(AmbariLocalSessionInterceptor.java:53)
>       at 
> com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
>       at 
> com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52)
>       at 
> org.apache.ambari.server.orm.dao.HostRoleCommandDAO$$EnhancerByGuice$$21789cd1.findByPKs(<generated>)
>       at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
>       at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>       at org.apache.ambari.server.actionmanager.Stage.<init>(Stage.java:157)
>       at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>       at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
>       at 
> org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
>       at 
> org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> {code:title=Server Action Executor}
>       at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
>       at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>       at org.apache.ambari.server.actionmanager.Stage.<init>(Stage.java:157)
>       at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>       at 
> org.apache.ambari.server.actionmanager.Request.<init>(Request.java:199)
>       at 
> org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance(<generated>)
>       at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>       at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
>       at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>       at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
>       at 
> com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
>       at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
>       at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
>       at 
> com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
>       at com.sun.proxy.$Proxy26.createExisting(Unknown Source)
>       at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getRequests(ActionDBAccessorImpl.java:784)
>       at 
> org.apache.ambari.server.serveraction.ServerActionExecutor.cleanRequestShareDataContexts(ServerActionExecutor.java:259)
>       - locked <0x00007ff0a14083c8> (a java.util.HashMap)
>       at 
> org.apache.ambari.server.serveraction.ServerActionExecutor.doWork(ServerActionExecutor.java:454)
>       at 
> org.apache.ambari.server.serveraction.ServerActionExecutor$1.run(ServerActionExecutor.java:160)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> It's clear from these stacks that every {{PENDING}} stage (roughly 15,000) 
> were being loaded into memory every second (and their accompanying task as 
> well). This makes no sense as these methods don't need all stages - just the 
> _next_ stage. This is because all stages are synchronous within a single 
> request.
> The proposed solution is to fix the {{StageEntity.findByCommandStatuses}} 
> call so it doesn't return every stage:
> {code}
> SELECT stage.requestid, 
>        MIN(stage.stageid) 
> FROM   stageentity stage, 
>        hostrolecommandentity hrc 
> WHERE  hrc.status IN :statuses 
>        AND hrc.stageid = stage.stageid 
>        AND hrc.requestid = stage.requestid 
> GROUP  BY stage.requestid 
> {code}
> *Note that this might not appear on trunk due to AMBARI-18868*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to