[
https://issues.apache.org/jira/browse/MESOS-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Rukletsov updated MESOS-5650:
---------------------------------------
Shepherd: Alexander Rukletsov
> UNRESERVE operation causes master to crash.
> -------------------------------------------
>
> Key: MESOS-5650
> URL: https://issues.apache.org/jira/browse/MESOS-5650
> Project: Mesos
> Issue Type: Bug
> Components: allocation
> Affects Versions: 0.28.1
> Reporter: Alexander Rukletsov
> Assignee: Neil Conway
> Priority: Blocker
> Labels: mesosphere
> Fix For: 1.0.0
>
>
> {{RESERVE}} operation may cause a master failure:
> {noformat}
> I0619 05:02:02.298602 11194 http.cpp:312] HTTP GET for /master/slaves from
> 172.17.0.4:49617 with User-Agent='python-requests/2.9.1'
> I0619 05:02:02.305542 11193 http.cpp:312] HTTP POST for
> /master/destroy-volumes from 172.17.0.4:49618 with
> User-Agent='python-requests/2.9.1'
> I0619 05:02:02.306731 11191 master.cpp:6560] Sending checkpointed resources
> mem(kafkatest-role, kafkatest-principal, {resource_id:
> 7408cc53-183c-48c2-a07f-7087806219f3}):256; cpus(kafkatest-role,
> kafkatest-principal, {resource_id:
> d7888099-db8f-4018-9109-f70fb1174f53}):1.5; mem(kafkatest-role,
> kafkatest-principal, {resource_id:
> b5dd90fc-2c12-4199-9fc4-cf9f918e332b}):2304; ports(kafkatest-role,
> kafkatest-principal, {resource_id:
> a0ee4e01-803f-4b71-950d-483caeb01a57}):[9305-9305, 11596-11596];
> cpus(kafkatest-role, kafkatest-principal, {resource_id:
> 8cd72abb-7089-4220-bb90-46b70c9953ab}):0.5; disk(kafkatest-role,
> kafkatest-principal, {resource_id:
> ed06ec6e-2d15-4d0e-bbc4-95a942e58596})[]:11204 to slave
> a80ff9dd-e046-43ab-b763-28365b136f6b-S0 at slave(1)@10.0.0.5:5051 (10.0.0.5)
> I0619 05:02:02.311069 11189 http.cpp:312] HTTP POST for
> /master/destroy-volumes from 172.17.0.4:49619 with
> User-Agent='python-requests/2.9.1'
> I0619 05:02:02.312191 11187 master.cpp:6560] Sending checkpointed resources
> cpus(kafkatest-role, kafkatest-principal, {resource_id:
> f1ff4806-0c24-4d60-ad2b-b06462ee4081}):1.5; mem(kafkatest-role,
> kafkatest-principal, {resource_id:
> cb8dc92d-64f0-4007-8520-1f63625b98c0}):2304; ports(kafkatest-role,
> kafkatest-principal, {resource_id:
> 225b4172-be77-453a-a94f-8845edc3f09a}):[9692-9692, 11824-11824];
> cpus(kafkatest-role, kafkatest-principal, {resource_id:
> 942e102a-ca63-480d-9853-9a39e2695ec9}):0.5; mem(kafkatest-role,
> kafkatest-principal, {resource_id:
> cad57f8c-27f5-484c-a3fb-e80da74f0813}):256; disk(kafkatest-role,
> kafkatest-principal, {resource_id:
> e6563e09-e284-4aaf-8d53-72056695de41})[]:11204 to slave
> 489aa72f-ae07-4383-a56f-6fe9346ace37-S7 at slave(1)@10.0.0.7:5051 (10.0.0.7)
> I0619 05:02:02.316118 11189 http.cpp:312] HTTP GET for /master/slaves from
> 172.17.0.4:49620 with User-Agent='python-requests/2.9.1'
> I0619 05:02:02.321527 11189 http.cpp:312] HTTP POST for /master/unreserve
> from 172.17.0.4:49621 with User-Agent='python-requests/2.9.1'
> I0619 05:02:02.323523 11193 master.cpp:6560] Sending checkpointed resources
> to slave a80ff9dd-e046-43ab-b763-28365b136f6b-S0 at slave(1)@10.0.0.5:5051
> (10.0.0.5)
> I0619 05:02:02.327658 11191 http.cpp:312] HTTP POST for /master/unreserve
> from 172.17.0.4:49622 with User-Agent='python-requests/2.9.1'
> F0619 05:02:02.329208 11190 sorter.cpp:284] Check failed:
> total_.scalarQuantities.contains(oldSlaveQuantity)
> {noformat}
> Possible reasons:
> * Recent improvements in allocator (b4d746f)
> * Bug in bookkeeping during the previous {{UNRESERVE}}
> * Network partition that happened after {{RESERVE}} and before {{UNRESERVE}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)