[
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279337#comment-15279337
]
Till Toenshoff edited comment on MESOS-5340 at 5/11/16 1:05 AM:
----------------------------------------------------------------
In parallel, [~alexr] and I came up with a different, but also more intrusive
approach: https://reviews.apache.org/r/47207/
was (Author: tillt):
I parallel, [~alexr] and I came up with a different, but also more intrusive
approach: https://reviews.apache.org/r/47207/
> libevent builds may prevent new connections
> -------------------------------------------
>
> Key: MESOS-5340
> URL: https://issues.apache.org/jira/browse/MESOS-5340
> Project: Mesos
> Issue Type: Bug
> Components: security
> Affects Versions: 0.29.0, 0.28.1
> Reporter: Till Toenshoff
> Assignee: Benjamin Mahler
> Priority: Blocker
> Labels: mesosphere, security, ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading
> support, any connection that does not actually transmit data will hang the
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N";
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to
> new connections. This will persist until either some data is transmitted via
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster
> with enabled load-balancer (which uses an idle, persistent connection) for
> the master node. Such connection does naturally not transmit any data as long
> as there are no external requests routed via the load-balancer. AWS allows
> setting up a timeout for those connections and in our test environment, this
> duration was set to 60 seconds and hence we were seeing our master getting
> repetitively unresponsive for 60 seconds, then getting "unstuck" for a brief
> period until it got stuck again.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)