Alexander Rukletsov created MESOS-5650:
------------------------------------------
Summary: UNRESERVE operation causes master to crash.
Key: MESOS-5650
URL: https://issues.apache.org/jira/browse/MESOS-5650
Project: Mesos
Issue Type: Bug
Components: allocation
Affects Versions: 0.28.1
Reporter: Alexander Rukletsov
Priority: Blocker
Fix For: 1.0.0
{{RESERVE}} operation may cause a master failure:
{noformat}
I0619 05:02:02.298602 11194 http.cpp:312] HTTP GET for /master/slaves from
172.17.0.4:49617 with User-Agent='python-requests/2.9.1'
I0619 05:02:02.305542 11193 http.cpp:312] HTTP POST for /master/destroy-volumes
from 172.17.0.4:49618 with User-Agent='python-requests/2.9.1'
I0619 05:02:02.306731 11191 master.cpp:6560] Sending checkpointed resources
mem(kafkatest-role, kafkatest-principal, {resource_id:
7408cc53-183c-48c2-a07f-7087806219f3}):256; cpus(kafkatest-role,
kafkatest-principal, {resource_id: d7888099-db8f-4018-9109-f70fb1174f53}):1.5;
mem(kafkatest-role, kafkatest-principal, {resource_id:
b5dd90fc-2c12-4199-9fc4-cf9f918e332b}):2304; ports(kafkatest-role,
kafkatest-principal, {resource_id:
a0ee4e01-803f-4b71-950d-483caeb01a57}):[9305-9305, 11596-11596];
cpus(kafkatest-role, kafkatest-principal, {resource_id:
8cd72abb-7089-4220-bb90-46b70c9953ab}):0.5; disk(kafkatest-role,
kafkatest-principal, {resource_id:
ed06ec6e-2d15-4d0e-bbc4-95a942e58596})[]:11204 to slave
a80ff9dd-e046-43ab-b763-28365b136f6b-S0 at slave(1)@10.0.0.5:5051 (10.0.0.5)
I0619 05:02:02.311069 11189 http.cpp:312] HTTP POST for /master/destroy-volumes
from 172.17.0.4:49619 with User-Agent='python-requests/2.9.1'
I0619 05:02:02.312191 11187 master.cpp:6560] Sending checkpointed resources
cpus(kafkatest-role, kafkatest-principal, {resource_id:
f1ff4806-0c24-4d60-ad2b-b06462ee4081}):1.5; mem(kafkatest-role,
kafkatest-principal, {resource_id: cb8dc92d-64f0-4007-8520-1f63625b98c0}):2304;
ports(kafkatest-role, kafkatest-principal, {resource_id:
225b4172-be77-453a-a94f-8845edc3f09a}):[9692-9692, 11824-11824];
cpus(kafkatest-role, kafkatest-principal, {resource_id:
942e102a-ca63-480d-9853-9a39e2695ec9}):0.5; mem(kafkatest-role,
kafkatest-principal, {resource_id: cad57f8c-27f5-484c-a3fb-e80da74f0813}):256;
disk(kafkatest-role, kafkatest-principal, {resource_id:
e6563e09-e284-4aaf-8d53-72056695de41})[]:11204 to slave
489aa72f-ae07-4383-a56f-6fe9346ace37-S7 at slave(1)@10.0.0.7:5051 (10.0.0.7)
I0619 05:02:02.316118 11189 http.cpp:312] HTTP GET for /master/slaves from
172.17.0.4:49620 with User-Agent='python-requests/2.9.1'
I0619 05:02:02.321527 11189 http.cpp:312] HTTP POST for /master/unreserve from
172.17.0.4:49621 with User-Agent='python-requests/2.9.1'
I0619 05:02:02.323523 11193 master.cpp:6560] Sending checkpointed resources to
slave a80ff9dd-e046-43ab-b763-28365b136f6b-S0 at slave(1)@10.0.0.5:5051
(10.0.0.5)
I0619 05:02:02.327658 11191 http.cpp:312] HTTP POST for /master/unreserve from
172.17.0.4:49622 with User-Agent='python-requests/2.9.1'
F0619 05:02:02.329208 11190 sorter.cpp:284] Check failed:
total_.scalarQuantities.contains(oldSlaveQuantity)
{noformat}
Possible reasons:
* Recent improvements in allocator (b4d746f)
* Bug in bookkeeping during the previous {{UNRESERVE}}
* Network partition that happened after {{RESERVE}} and before {[UNRESERVE}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)