[ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275888#comment-15275888
 ] 

haosdent commented on MESOS-5340:
---------------------------------

According to my test, I think this is not related to ssl downgrading. It could 
happen when we {{export SSL_SUPPORT_DOWNGRADE=false}}

{code}
# Console 1
$ telnet localhost 5050
{code}

{code}
$ curl https://www.haosdent.me:5050/master/slaves
# stuck
{code}

It is because the handle logic of accept is serial in {{process.cpp}}.
{code}
void on_accept(const Future<Socket>& socket)
{
  LOG(INFO) << "Start accept socket";
  if (socket.isReady()) {
    // Inform the socket manager for proper bookkeeping.
    socket_manager->accepted(socket.get());

    const size_t size = 80 * 1024;
    char* data = new char[size];

    DataDecoder* decoder = new DataDecoder(socket.get());

    socket.get().recv(data, size)
      .onAny(lambda::bind(
          &internal::decode_recv,
          lambda::_1,
          data,
          size,
          new Socket(socket.get()),
          decoder));
  }

  __s__->accept()
    .onAny(lambda::bind(&on_accept, lambda::_1));
}
{code}
{{process}} only continue to handle the next {{Future<Socket>}} item from 
{{LibeventSSLSocketImpl::accept_queue}} after current one success or fail.

> SSL-downgrading support may prevent new connections
> ---------------------------------------------------
>
>                 Key: MESOS-5340
>                 URL: https://issues.apache.org/jira/browse/MESOS-5340
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.29.0, 0.28.1
>            Reporter: Till Toenshoff
>            Priority: Blocker
>              Labels: ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to 
> new connections. This will persist until either some data is transmitted via 
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster 
> with enabled internal ELB health-checks for the master node. Those 
> health-checks are using long-lasting connections that do not transmit any 
> data and are closed after a configurable duration. In our test environment, 
> this duration was set to 60 seconds and hence we were seeing our master 
> getting repetitively unresponsive for 60 seconds, then getting "unstuck" for 
> a brief period until it got stuck again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to