[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748875#comment-13748875
 ] 

Koushik Das commented on CLOUDSTACK-4350:
-----------------------------------------

The trace enabled logs helped in identifying the delay in adding hosts as the 
number of hosts increases. The snippet below shows that a specific select query 
is very inefficient and doesn't scale as the number of hosts increases (> 19K).

As part of host connect, listeners are invoked for doing various stuff. One 
such listener is the DownloadListener which checks if system VM templates needs 
to be downloaded for a specific hypervisor type. Now the way this is done is to 
check if the zone already has a hypervisor of that type and in that case the 
step is skipped as the templates would already have been downloaded when the 
first host of a specific hypervisor got added. The hypervisors already present 
in the zone is computed by querying all existing hosts in the zone and then in 
a loop (in Java code) all the hypervisor types are listed. This is highly 
inefficient and more so in a scaled up environment with lots of hosts.   

2013-08-17 11:40:20,990 TRACE [db.Transaction.Connection] (ApiServer-9:null) 
Creating a DB connection with  no txn:  for 0: dbconn497236897. Stack: 
-Transaction.prepareStatement:469-Transaction.prepareAutoCloseStatement:462-GenericDaoBase.searchIncludingRemoved:387-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-GenericDaoBase.searchIncludingRemoved:349-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-GenericDaoBase.search:333-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-GenericDaoBase.search:1242-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-SearchCriteria2.list:126-ResourceManagerImpl.listAvailHypervisorInZone:2390
2013-08-17 11:40:20,990 TRACE [db.Transaction.Statement] (ApiServer-9:null) 
Preparing: SELECT host.id, host.disconnected, host.name, host.status, 
host.type, host.private_ip_address, host.private_mac_address, 
host.private_netmask, host.public_netmask, host.public_ip_address, 
host.public_mac_address, host.storage_ip_address, host.cluster_id, 
host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, 
host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, 
host.proxy_port, host.resource, host.fs_type, host.available, host.setup, 
host.resource_state, host.hypervisor_version, host.update_count, host.uuid, 
host.data_center_id, host.pod_id, host.cpus, host.url, host.speed, host.ram, 
host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, 
host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed 
FROM host WHERE host.data_center_id = ?  AND host.id != ?  AND host.type = ?  
AND host.removed IS NULL 
2013-08-17 11:41:53,578 TRACE [db.Transaction.Statement] (ApiServer-9:null) 
Closing: com.mysql.jdbc.PreparedStatement@479fd63a: SELECT host.id, 
host.disconnected, host.name, host.status, host.type, host.private_ip_address, 
host.private_mac_address, host.private_netmask, host.public_netmask, 
host.public_ip_address, host.public_mac_address, host.storage_ip_address, 
host.cluster_id, host.storage_netmask, host.storage_mac_address, 
host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, 
host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, 
host.available, host.setup, host.resource_state, host.hypervisor_version, 
host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpus, 
host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, 
host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, 
host.version, host.created, host.removed FROM host WHERE host.data_center_id = 
1  AND host.id != 19500  AND host.type = 'Routing'  AND host.removed IS NULL 

                
> [Performance Testing] Adding hosts take much longer time than baselines
> -----------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-4350
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4350
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Management Server
>    Affects Versions: 4.2.0
>         Environment: 4.2, performance test env, with simulator
>            Reporter: Sowmya Krishnan
>            Assignee: Koushik Das
>            Priority: Critical
>              Labels: perfomance
>             Fix For: Future
>
>
> Performance test setup:
> Basic zone, 1 host/cluster, tyring to deploy 20K simulator Hosts, with host 
> tags
> Compared to baseline numbers, deploying hosts is taking  much longer in the 
> simulator environment
> For the 1st 1000 hosts, it took about 4 mins to deploy as per baseline
> With 4.2, the 1st 1000 hosts are taking almost 7 minutes
> Configurations:
> heap size: -Xmx12288m
> db.cloud.url.params=prepStmtCacheSize=517&cachePrepStmts=true&prepStmtCacheSqlLimit=4096&includeInnodbStatusInDeadlockExceptions=true&logSlowQueries=true
> For 20K Hosts, the number exponentially increases and the deployment time 
> also increases.
> Attaching trace logs of 4.2 for 1k simulator hosts deployment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to