Till Toenshoff created MESOS-5340:
-------------------------------------

             Summary: SSL-downgrading support may hang libprocess
                 Key: MESOS-5340
                 URL: https://issues.apache.org/jira/browse/MESOS-5340
             Project: Mesos
          Issue Type: Bug
    Affects Versions: 0.28.1, 0.29.0
            Reporter: Till Toenshoff
            Priority: Blocker


When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
support, any connection that does not actually transmit data will hang the 
runnable (e.g. master).

For reproducing the issue (on any platform)...

Spin up a master with enabled SSL-downgrading:
{noformat}
$ export SSL_ENABLED=true
$ export SSL_SUPPORT_DOWNGRADE=true
$ export SSL_KEY_FILE=/path/to/your/foo.key
$ export SSL_CERT_FILE=/path/to/your/foo.crt
$ export SSL_CA_FILE=/path/to/your/ca.crt
$ ./bin/mesos-master.sh --work_dir=/tmp/foo
{noformat}

Create some artificial HTTP request load for quickly spotting the problem in 
both, the master logs as well as the output of CURL itself:
{noformat}
$ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; echo); 
done
{noformat}

Now create a connection to the master that does not transmit any data:
{noformat}
$ telnet localhost 5050
{noformat}

You should now see the CURL requests hanging, the master stops responding to 
new connections. This will persist until either some data is transmitted via 
the above telnet connection or it is closed.

This problem has initially been observed when running Mesos on an AWS cluster 
with enabled internal ELB health-checks for the master node. Those 
health-checks are using long-lasting connections that do not transmit any data 
and are closed after a configurable duration. In our test environment, this 
duration was set to 60 seconds and hence we were seeing our master getting 
repetitively unresponsive for 60 seconds, then getting "unstuck" for a brief 
period until it got stuck again.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to