[ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-5340:
-----------------------------------
    Shepherd: Joris Van Remoortere
    Assignee: Benjamin Mahler

[~jvanremoortere] I took a look and have a proposed a fix here: 
https://reviews.apache.org/r/47192/

> libevent builds may prevent new connections
> -------------------------------------------
>
>                 Key: MESOS-5340
>                 URL: https://issues.apache.org/jira/browse/MESOS-5340
>             Project: Mesos
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.29.0, 0.28.1
>            Reporter: Till Toenshoff
>            Assignee: Benjamin Mahler
>            Priority: Blocker
>              Labels: mesosphere, security, ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to 
> new connections. This will persist until either some data is transmitted via 
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster 
> with enabled load-balancer (which uses an idle, persistent connection) for 
> the master node. Such connection does naturally not transmit any data as long 
> as there are no external requests routed via the load-balancer. AWS allows 
> setting up a timeout for those connections and in our test environment, this 
> duration was set to 60 seconds and hence we were seeing our master getting 
> repetitively unresponsive for 60 seconds, then getting "unstuck" for a brief 
> period until it got stuck again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to