[
https://issues.apache.org/jira/browse/MESOS-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kone reassigned MESOS-7766:
---------------------------------
Assignee: Alexander Rukletsov
[~alexr] Can you assign this to the appropriate person? Maybe you or [~abudnik]?
> Segfault when trying to accept inverse offer with unknown offerId
> -----------------------------------------------------------------
>
> Key: MESOS-7766
> URL: https://issues.apache.org/jira/browse/MESOS-7766
> Project: Mesos
> Issue Type: Bug
> Components: master
> Affects Versions: 1.0.4, 1.1.1
> Reporter: Benjamin Bannier
> Assignee: Alexander Rukletsov
> Labels: mesosphere
>
> We just saw the following in a test cluster:
> {noformat}
> W0707 06:30:10.172188 9413 master.cpp:3939] Ignoring accept of inverse offer
> abd00119-7353-4990-9cc5-0d6bd69a91e7-O737973 since it is no longer valid
> F0707 06:30:10.172236 9413 master.cpp:3943] CHECK_SOME(slaveId): is NONE
> *** Check failure stack trace: ***
> @ 0x7f425b1521ed google::LogMessage::Fail()
> @ 0x7f425b15401d google::LogMessage::SendToLog()
> @ 0x7f425b151ddc google::LogMessage::Flush()
> @ 0x7f425b154919 google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f425a564ce9 _CheckFatal::~_CheckFatal()
> @ 0x7f425a76a69d
> mesos::internal::master::Master::acceptInverseOffers()
> @ 0x7f425a6e360e mesos::internal::master::Master::Http::scheduler()
> @ 0x7f425a737347
> _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestERK6OptionISsEEZN5mesos8internal6master6Master10initializeEvEUlS7_SB_E1_E9_M_invokeERKSt9_Any_dataS7_SB_
> @ 0x7f425b0d7413
> _ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultEEEEE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
> @ 0x7f425b0e1091 process::ProcessManager::resume()
> @ 0x7f425b0e1397
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f4259770d73 (unknown)
> @ 0x7f4258f6d52c (unknown)
> @ 0x7f4258cab1dd (unknown)
> {noformat}
> This seems to happen for cases where we try to accept an invalid inverse
> offer and incorrectly assume that we can always extract an agent id,
> {code}
> Option<SlaveID> slaveId;
> // Update each inverse offer in the allocator with the accept and
> // filter.
> foreach (const OfferID& offerId, accept.inverse_offer_ids()) {
> InverseOffer* inverseOffer = getInverseOffer(offerId);
> if (inverseOffer != nullptr) {
> CHECK(inverseOffer->has_slave_id());
> slaveId = inverseOffer->slave_id();
> mesos::allocator::InverseOfferStatus status;
> status.set_status(mesos::allocator::InverseOfferStatus::ACCEPT);
> status.mutable_framework_id()->CopyFrom(inverseOffer->framework_id());
> status.mutable_timestamp()->CopyFrom(protobuf::getCurrentTime());
> allocator->updateInverseOffer(
> inverseOffer->slave_id(),
> inverseOffer->framework_id(),
> UnavailableResources{
> inverseOffer->resources(),
> inverseOffer->unavailability()},
> status,
> accept.filters());
> removeInverseOffer(inverseOffer);
> continue;
> }
> // If the offer was not in our inverse offer set, then this
> // offer is no longer valid.
> LOG(WARNING) << "Ignoring accept of inverse offer " << offerId
> << " since it is no longer valid";
> }
> CHECK_SOME(slaveId);
> {code}
> If {{offerId}} is invalid, {{slaveId}} will never be set to a value, causing
> the {{CHECK_SOME}} to fail.
> I see this issue in 1.0.4 and 1.1.1; the problematic code seems to be gone in
> 1.1.2.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)