[
https://issues.apache.org/jira/browse/CLOUDSTACK-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748875#comment-13748875
]
Koushik Das commented on CLOUDSTACK-4350:
-----------------------------------------
The trace enabled logs helped in identifying the delay in adding hosts as the
number of hosts increases. The snippet below shows that a specific select query
is very inefficient and doesn't scale as the number of hosts increases (> 19K).
As part of host connect, listeners are invoked for doing various stuff. One
such listener is the DownloadListener which checks if system VM templates needs
to be downloaded for a specific hypervisor type. Now the way this is done is to
check if the zone already has a hypervisor of that type and in that case the
step is skipped as the templates would already have been downloaded when the
first host of a specific hypervisor got added. The hypervisors already present
in the zone is computed by querying all existing hosts in the zone and then in
a loop (in Java code) all the hypervisor types are listed. This is highly
inefficient and more so in a scaled up environment with lots of hosts.
2013-08-17 11:40:20,990 TRACE [db.Transaction.Connection] (ApiServer-9:null)
Creating a DB connection with no txn: for 0: dbconn497236897. Stack:
-Transaction.prepareStatement:469-Transaction.prepareAutoCloseStatement:462-GenericDaoBase.searchIncludingRemoved:387-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-GenericDaoBase.searchIncludingRemoved:349-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-GenericDaoBase.search:333-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-GenericDaoBase.search:1242-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-SearchCriteria2.list:126-ResourceManagerImpl.listAvailHypervisorInZone:2390
2013-08-17 11:40:20,990 TRACE [db.Transaction.Statement] (ApiServer-9:null)
Preparing: SELECT host.id, host.disconnected, host.name, host.status,
host.type, host.private_ip_address, host.private_mac_address,
host.private_netmask, host.public_netmask, host.public_ip_address,
host.public_mac_address, host.storage_ip_address, host.cluster_id,
host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2,
host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type,
host.proxy_port, host.resource, host.fs_type, host.available, host.setup,
host.resource_state, host.hypervisor_version, host.update_count, host.uuid,
host.data_center_id, host.pod_id, host.cpus, host.url, host.speed, host.ram,
host.parent, host.guid, host.capabilities, host.total_size, host.last_ping,
host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed
FROM host WHERE host.data_center_id = ? AND host.id != ? AND host.type = ?
AND host.removed IS NULL
2013-08-17 11:41:53,578 TRACE [db.Transaction.Statement] (ApiServer-9:null)
Closing: com.mysql.jdbc.PreparedStatement@479fd63a: SELECT host.id,
host.disconnected, host.name, host.status, host.type, host.private_ip_address,
host.private_mac_address, host.private_netmask, host.public_netmask,
host.public_ip_address, host.public_mac_address, host.storage_ip_address,
host.cluster_id, host.storage_netmask, host.storage_mac_address,
host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2,
host.hypervisor_type, host.proxy_port, host.resource, host.fs_type,
host.available, host.setup, host.resource_state, host.hypervisor_version,
host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpus,
host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities,
host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory,
host.version, host.created, host.removed FROM host WHERE host.data_center_id =
1 AND host.id != 19500 AND host.type = 'Routing' AND host.removed IS NULL
> [Performance Testing] Adding hosts take much longer time than baselines
> -----------------------------------------------------------------------
>
> Key: CLOUDSTACK-4350
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4350
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: Management Server
> Affects Versions: 4.2.0
> Environment: 4.2, performance test env, with simulator
> Reporter: Sowmya Krishnan
> Assignee: Koushik Das
> Priority: Critical
> Labels: perfomance
> Fix For: Future
>
>
> Performance test setup:
> Basic zone, 1 host/cluster, tyring to deploy 20K simulator Hosts, with host
> tags
> Compared to baseline numbers, deploying hosts is taking much longer in the
> simulator environment
> For the 1st 1000 hosts, it took about 4 mins to deploy as per baseline
> With 4.2, the 1st 1000 hosts are taking almost 7 minutes
> Configurations:
> heap size: -Xmx12288m
> db.cloud.url.params=prepStmtCacheSize=517&cachePrepStmts=true&prepStmtCacheSqlLimit=4096&includeInnodbStatusInDeadlockExceptions=true&logSlowQueries=true
> For 20K Hosts, the number exponentially increases and the deployment time
> also increases.
> Attaching trace logs of 4.2 for 1k simulator hosts deployment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira