[jira] [Commented] (AMBARI-13065) RU: Core Slaves restart schedule is extremely slow on very large cluster

Jayush Luniya (JIRA) Thu, 10 Sep 2015 12:35:38 -0700

    [ 
https://issues.apache.org/jira/browse/AMBARI-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739435#comment-14739435
 ]


Jayush Luniya commented on AMBARI-13065:
----------------------------------------

The slowdown here is with getStagesInProgress(), getAllStages() in 
ActionDBAccessorImpl.java. Converting a StageEntity to Stage takes ~30ms and so 
for ~3000 stageEntities it takes ~90 secs. Given that these are IO-bound the 
for loop can be parallelized. I prototyped a solution and this could be done in 
~11secs. 

{code}
  @Override
  public List<Stage> getAllStages(long requestId) {
    List<Stage> stages = new ArrayList<Stage>();
    for (StageEntity stageEntity : stageDAO.findByRequestId(requestId)) {
      stages.add(stageFactory.createExisting(stageEntity));
    }
    return stages;
  }
{code}

> RU: Core Slaves restart schedule is extremely slow on very large cluster
> ------------------------------------------------------------------------
>
>                 Key: AMBARI-13065
>                 URL: https://issues.apache.org/jira/browse/AMBARI-13065
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.1.2
>            Reporter: Jayush Luniya
>            Assignee: Jayush Luniya
>            Priority: Blocker
>             Fix For: 2.1.2
>
>
> Performed RU on 1200 node cluster and the progress of 'Core Slaves' restarts 
> is extremely slow - In 3 hours it restarted only 22 components (screenshot 
> attached). At this rate it will take weeks for RU to complete.
> It we look into the agent log where RU core-slaves finished, we see that 
> sequential commands are sent 8 minutes apart - which is very slow. The 
> commands themselves execute in under a minute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AMBARI-13065) RU: Core Slaves restart schedule is extremely slow on very large cluster

Reply via email to