[ https://issues.apache.org/jira/browse/CLOUDSTACK-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748875#comment-13748875 ]
Koushik Das commented on CLOUDSTACK-4350: ----------------------------------------- The trace enabled logs helped in identifying the delay in adding hosts as the number of hosts increases. The snippet below shows that a specific select query is very inefficient and doesn't scale as the number of hosts increases (> 19K). As part of host connect, listeners are invoked for doing various stuff. One such listener is the DownloadListener which checks if system VM templates needs to be downloaded for a specific hypervisor type. Now the way this is done is to check if the zone already has a hypervisor of that type and in that case the step is skipped as the templates would already have been downloaded when the first host of a specific hypervisor got added. The hypervisors already present in the zone is computed by querying all existing hosts in the zone and then in a loop (in Java code) all the hypervisor types are listed. This is highly inefficient and more so in a scaled up environment with lots of hosts. 2013-08-17 11:40:20,990 TRACE [db.Transaction.Connection] (ApiServer-9:null) Creating a DB connection with no txn: for 0: dbconn497236897. Stack: -Transaction.prepareStatement:469-Transaction.prepareAutoCloseStatement:462-GenericDaoBase.searchIncludingRemoved:387-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-GenericDaoBase.searchIncludingRemoved:349-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-GenericDaoBase.search:333-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-GenericDaoBase.search:1242-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-SearchCriteria2.list:126-ResourceManagerImpl.listAvailHypervisorInZone:2390 2013-08-17 11:40:20,990 TRACE [db.Transaction.Statement] (ApiServer-9:null) Preparing: SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host WHERE host.data_center_id = ? AND host.id != ? AND host.type = ? AND host.removed IS NULL 2013-08-17 11:41:53,578 TRACE [db.Transaction.Statement] (ApiServer-9:null) Closing: com.mysql.jdbc.PreparedStatement@479fd63a: SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host WHERE host.data_center_id = 1 AND host.id != 19500 AND host.type = 'Routing' AND host.removed IS NULL > [Performance Testing] Adding hosts take much longer time than baselines > ----------------------------------------------------------------------- > > Key: CLOUDSTACK-4350 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4350 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Management Server > Affects Versions: 4.2.0 > Environment: 4.2, performance test env, with simulator > Reporter: Sowmya Krishnan > Assignee: Koushik Das > Priority: Critical > Labels: perfomance > Fix For: Future > > > Performance test setup: > Basic zone, 1 host/cluster, tyring to deploy 20K simulator Hosts, with host > tags > Compared to baseline numbers, deploying hosts is taking much longer in the > simulator environment > For the 1st 1000 hosts, it took about 4 mins to deploy as per baseline > With 4.2, the 1st 1000 hosts are taking almost 7 minutes > Configurations: > heap size: -Xmx12288m > db.cloud.url.params=prepStmtCacheSize=517&cachePrepStmts=true&prepStmtCacheSqlLimit=4096&includeInnodbStatusInDeadlockExceptions=true&logSlowQueries=true > For 20K Hosts, the number exponentially increases and the deployment time > also increases. > Attaching trace logs of 4.2 for 1k simulator hosts deployment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira