[ 
https://issues.apache.org/jira/browse/MESOS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530848#comment-15530848
 ] 

Joseph Wu commented on MESOS-6233:
----------------------------------

{code}
commit 79c72d7623037e04d96265dedf537a6c4aad3219
Author: Joseph Wu <jos...@mesosphere.io>
Date:   Wed Sep 28 12:33:45 2016 -0700

    Prevented a race when relinking with SSL downgrade enabled.
    
    If there are two or more actors, on the same OS process, that try to
    relink at the same time, there is a potential CHECK failure when SSL
    downgrade is also enabled.  The following interleaving is problematic:
    |             Actor A             |             Actor B             |
    |---------------------------------+---------------------------------|
    |        Starts to relink.        |        Starts to relink.        |
    |        Creates socket A.        |                                 |
    |   Replaces link with Socket A.  |                                 |
    | Tries to connect with Socket A. |                                 |
    |                                 |        Creates socket B.        |
    |  Connection fails due to SSL.   |   Replaces link with Socket B.  |
    |     Attempts to downgrade.      |                                 |
    |     Tries to replace link.      |                                 |
    The last step in the interleaving fails because we assert that the
    socket we are swapping out exists in the `SocketManager`s state.
    
    Review: https://reviews.apache.org/r/52182/
{code}

> Master CHECK fails during recovery while relinking to other masters
> -------------------------------------------------------------------
>
>                 Key: MESOS-6233
>                 URL: https://issues.apache.org/jira/browse/MESOS-6233
>             Project: Mesos
>          Issue Type: Bug
>          Components: general, master
>    Affects Versions: 0.28.3, 1.0.1
>            Reporter: Alex Kaplan
>            Assignee: Joseph Wu
>            Priority: Blocker
>              Labels: mesosphere
>             Fix For: 0.28.3, 1.1.0, 1.0.2
>
>
> Mesos Version: 1.0.1
> OS: CoreOS 1068
> {code}
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: I0922 20:05:17.948004 
> 104495 manager.cpp:795] overlay-master in `RECOVERING` state . Hence, not 
> sending an update to agentoverlay-agent@10.4.4.1:5051
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: F0922 20:05:17.948120 
> 104529 process.cpp:2243] Check failed: sockets.count(from_fd) > 0
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: *** Check failure 
> stack trace: ***
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc1908829fd  google::LogMessage::Fail()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc19088482d  google::LogMessage::SendToLog()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc1908825ec  google::LogMessage::Flush()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc190885129  google::LogMessageFatal::~LogMessageFatal()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc1908171dd  process::SocketManager::swap_implementing_socket()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc19081aa90  process::SocketManager::link_connect()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc1908227f9  
> _ZNSt17_Function_handlerIFvRKN7process6FutureI7NothingEEEZNKS3_5onAnyISt5_BindIFSt7_Mem_fnIMNS0_13SocketManagerEFvS5_NS0_7network6SocketERKNS0_4UPIDEEEPSA_St12_PlaceholderILi1EESC_SD_EEvEES5_OT_NS3_6PreferEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @           
> 0x41eb26  
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureI7NothingEEEEJRS5_EEEvRKSt6vectorIT_SaISC_EEDpOT0_
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @           
> 0x42a36f  process::Future<>::fail()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc19085283c  process::network::LibeventSSLSocketImpl::event_callback()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc190852f17  process::network::LibeventSSLSocketImpl::event_callback()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc18d616631  bufferevent_run_deferred_callbacks_locked
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc18d60cc5d  event_base_loop
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc190865a1d  process::EventLoop::run()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc18eeabd73  (unknown)
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc18e6a852c  (unknown)
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]:     @     
> 0x7fc18e3e61dd  (unknown)
> Sep 22 20:05:18 node-44a84215535c systemd[1]: 
> [0;1;39mdcos-mesos-master.service: Main process exited, code=killed, 
> status=6/ABRT
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to