[
https://issues.apache.org/jira/browse/AMBARI-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739435#comment-14739435
]
Jayush Luniya commented on AMBARI-13065:
----------------------------------------
The slowdown here is with getStagesInProgress(), getAllStages() in
ActionDBAccessorImpl.java. Converting a StageEntity to Stage takes ~30ms and so
for ~3000 stageEntities it takes ~90 secs. Given that these are IO-bound the
for loop can be parallelized. I prototyped a solution and this could be done in
~11secs.
{code}
@Override
public List<Stage> getAllStages(long requestId) {
List<Stage> stages = new ArrayList<Stage>();
for (StageEntity stageEntity : stageDAO.findByRequestId(requestId)) {
stages.add(stageFactory.createExisting(stageEntity));
}
return stages;
}
{code}
> RU: Core Slaves restart schedule is extremely slow on very large cluster
> ------------------------------------------------------------------------
>
> Key: AMBARI-13065
> URL: https://issues.apache.org/jira/browse/AMBARI-13065
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.1.2
> Reporter: Jayush Luniya
> Assignee: Jayush Luniya
> Priority: Blocker
> Fix For: 2.1.2
>
>
> Performed RU on 1200 node cluster and the progress of 'Core Slaves' restarts
> is extremely slow - In 3 hours it restarted only 22 components (screenshot
> attached). At this rate it will take weeks for RU to complete.
> It we look into the agent log where RU core-slaves finished, we see that
> sequential commands are sent 8 minutes apart - which is very slow. The
> commands themselves execute in under a minute.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)