[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958935#comment-15958935
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9348:
--------------------------------------------

Github user marcaurele commented on the issue:

    https://github.com/apache/cloudstack/pull/2027
  
    @rhtyd I found one issue with the test and `NioConnection` class. This kind 
of intermittent problem are always hard to search for a root cause, but after 
lots of logging I finally found why. I updated the PR with the change.
    
    If the main thread running the test is stopped there 
https://github.com/apache/cloudstack/blob/master/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L102
 due to context switching, the flag `_isRunning` isn't switched to True by the 
time the NioServer connection handler start it's call loop, and it exits on the 
`while(_isRunning)`
    
https://github.com/apache/cloudstack/blob/master/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L125
 directly. Therefore the server isn't listening at all and the connection 
cannot be made. The flag `_isRunning` must be turned `true` before submitting 
the task/thread.
    
    I still digging into Nio thread handler as we are experiencing some problem 
in production when quite a few agents try to connect at the same time to a 
management server. None of them can connect.



> CloudStack Server degrades when a lot of connections on port 8250
> -----------------------------------------------------------------
>
>                 Key: CLOUDSTACK-9348
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9348
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>            Reporter: Rohit Yadav
>            Assignee: Rohit Yadav
>             Fix For: 4.9.0
>
>
> An intermittent issue was found with a large CloudStack deployment, where 
> servers could not keep agents connected on port 8250.
> All connections are handled by accept() in NioConnection:
> https://github.com/apache/cloudstack/blob/master/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L125
> A new connection is handled by accept() which does blocking SSL handshake. A 
> good fix would be to make this non-blocking and handle expensive tasks in 
> separate threads/pool. This way the main IO loop won't be blocked and can 
> continue to serve other agents/clients.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to