[ https://issues.apache.org/jira/browse/MESOS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530848#comment-15530848 ]
Joseph Wu commented on MESOS-6233: ---------------------------------- {code} commit 79c72d7623037e04d96265dedf537a6c4aad3219 Author: Joseph Wu <jos...@mesosphere.io> Date: Wed Sep 28 12:33:45 2016 -0700 Prevented a race when relinking with SSL downgrade enabled. If there are two or more actors, on the same OS process, that try to relink at the same time, there is a potential CHECK failure when SSL downgrade is also enabled. The following interleaving is problematic: | Actor A | Actor B | |---------------------------------+---------------------------------| | Starts to relink. | Starts to relink. | | Creates socket A. | | | Replaces link with Socket A. | | | Tries to connect with Socket A. | | | | Creates socket B. | | Connection fails due to SSL. | Replaces link with Socket B. | | Attempts to downgrade. | | | Tries to replace link. | | The last step in the interleaving fails because we assert that the socket we are swapping out exists in the `SocketManager`s state. Review: https://reviews.apache.org/r/52182/ {code} > Master CHECK fails during recovery while relinking to other masters > ------------------------------------------------------------------- > > Key: MESOS-6233 > URL: https://issues.apache.org/jira/browse/MESOS-6233 > Project: Mesos > Issue Type: Bug > Components: general, master > Affects Versions: 0.28.3, 1.0.1 > Reporter: Alex Kaplan > Assignee: Joseph Wu > Priority: Blocker > Labels: mesosphere > Fix For: 0.28.3, 1.1.0, 1.0.2 > > > Mesos Version: 1.0.1 > OS: CoreOS 1068 > {code} > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: I0922 20:05:17.948004 > 104495 manager.cpp:795] overlay-master in `RECOVERING` state . Hence, not > sending an update to agentoverlay-agent@10.4.4.1:5051 > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: F0922 20:05:17.948120 > 104529 process.cpp:2243] Check failed: sockets.count(from_fd) > 0 > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: *** Check failure > stack trace: *** > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc1908829fd google::LogMessage::Fail() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc19088482d google::LogMessage::SendToLog() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc1908825ec google::LogMessage::Flush() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc190885129 google::LogMessageFatal::~LogMessageFatal() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc1908171dd process::SocketManager::swap_implementing_socket() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc19081aa90 process::SocketManager::link_connect() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc1908227f9 > _ZNSt17_Function_handlerIFvRKN7process6FutureI7NothingEEEZNKS3_5onAnyISt5_BindIFSt7_Mem_fnIMNS0_13SocketManagerEFvS5_NS0_7network6SocketERKNS0_4UPIDEEEPSA_St12_PlaceholderILi1EESC_SD_EEvEES5_OT_NS3_6PreferEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x41eb26 > _ZN7process8internal3runISt8functionIFvRKNS_6FutureI7NothingEEEEJRS5_EEEvRKSt6vectorIT_SaISC_EEDpOT0_ > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x42a36f process::Future<>::fail() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc19085283c process::network::LibeventSSLSocketImpl::event_callback() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc190852f17 process::network::LibeventSSLSocketImpl::event_callback() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18d616631 bufferevent_run_deferred_callbacks_locked > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18d60cc5d event_base_loop > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc190865a1d process::EventLoop::run() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18eeabd73 (unknown) > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18e6a852c (unknown) > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18e3e61dd (unknown) > Sep 22 20:05:18 node-44a84215535c systemd[1]: > [0;1;39mdcos-mesos-master.service: Main process exited, code=killed, > status=6/ABRT > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)