[
https://issues.apache.org/jira/browse/MESOS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
cliff updated MESOS-5653:
-------------------------
Description:
When attempting to create a persistent volume via the /create-volumes operator
endpoint. I get a HTTP 200 from the master and in the logs on the master I see:
{noformat}
http.cpp:312] HTTP POST for /master/create-volumes from "172.16.10.11:40686
with User-Agent='curl/7.29.0' "
{noformat}
then next line I see on the master is:
{noformat}
"master.cpp:6560] Sending checkpointed resources to slave
0ef7d2e1-8b0d-44d4-8db0-cc58ac2058af-S0 at slave(1)@172.16.10.4:5051"
{noformat}
Now if I look in the logs on the slave that was specified in the request to
create a persistent volume I see:
then on the slave I see:
{noformat}
"1572 slave.cpp:2327] Updated checkpointed resources from to "
{noformat}
Notice that from destination and a to destination are both missing
specifically, they should be the valueos of:
checkpointedResources and newCheckpointedResources, from here:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2582
I am currently running only one slave for troubleshooting purposes, the
resource file on the slave with the disk resource looks like the following:
#resources=file:///etc/default/mesos.resources.json
{noformat}
[
{
"name": "disk",
"type": "SCALAR",
"scalar": {
"value": 50000
}
},
{
"name":"disk",
"type":"SCALAR",
"scalar":{
"value":1000000
},
"role":"testing",
"disk":{
"source":{
"type":"MOUNT",
"mount":{
"root":"/data"
}
}
}
},
{
"name":"cpus",
"type":"SCALAR",
"scalar":{
"value":16
},
"role":"testing"
},
{
"name":"mem",
"type":"SCALAR",
"scalar":{
"value":128000
},
"role":"testing"
},
{
"name":"ports",
"type":"RANGES",
"ranges":{
"range":[
{
"begin":31000,
"end":32000
}
]
},
"role":"testing"
}
]
{noformat}
When I {{curl master:5050/slaves | jq '.'}} and look under the key
{{reserved_resources_full}}, I see the above resources on that slave.
Here is my request to via the operator endpoint {{/create-resources}}, I am
trying to create a persistent volume on the disk of type MOUNT above, which is
in {{/proc/mounts}} as {{/data}}:
{noformat}
curl -i -d slaveId=0ee7d2e7-8b0d-44d4-8d80-cc58ac2058ae-S4 \
-d volumes='[
{
"name": "testvol",
"type": "SCALAR",
"scalar": { "value": 10000 },
"role": "testing",
"disk": {
"source": {
"type" : "MOUNT",
"path" : { "root" : "/data" }
},
"persistence": {
"id" : "cliff"
},
"volume": {
"mode": "RW",
"container_path": "/data"
}
}
}
]' -X POST http://master:5050/master/create-volumes
{noformat}
{noformat}
HTTP/1.1 200 OK
Date: Sun, 19 Jun 2016 04:38:45 GMT
{noformat}
If look at the slave specified with slaveID above via:
{noformat}
curl - http://slave1:5051/state
{noformat}
I will not see the volume created. Also here are no errors in the INFO logs on
either the master or slave relating to this request. The only log entries are
those that I have provided.
The same problem/behavior seems to exist when trying creating persistent
volumes on dynamically reserved resources as well.
My steps were:
systemctl stop meso-slave
cd /var/mesos
rm -rf meta
systemctl start mesos-slave
then I issued the following to the /reserve operator endpoint:
{noformat}
curl -i \
-d slaveId=0ee7d2b7-7b0d-44d4-8d80-cc51ac2058ae-S0 \
-d resources='[
{
"name": "disk",
"type": "SCALAR",
"scalar": { "value": 10000 },
"disk": {
"source": {
"type" : "MOUNT",
"path" : { "root" : "/data" }
},
"persistence": {
"id" : "testing"
},
"volume": {
"mode": "RW",
"container_path": "/data"
}
}
}
]' \
-X POST http://master:5050/master/reserve
{noformat}
The volume will never get created, there will be no error logged anywhere on
the master or slave and I will only see the following on the slave, the same as
when attempting to create a persistent volume on statically defined resources:
{noformat}
5558 slave.cpp:2327] Updated checkpointed resources from to
{noformat}
I also tried enabling auth to rule out that possibly being a factor. Steps
taken:
{noformat}
/etc/default/mesos-master:
export authenticate_http=true
export credentials="/etc/default/credentials.json"
/etc/default/credentials.json
{
"credentials" : [
{
"principal": "test",
"secret": "test"
}
]
}
{noformat}
restart masters with "systemctl restart mesos-master"
{noformat}
# curl -i \
> -u test:test \
> -d slaveId=af6e2f17-3d53-4656-a6ce-49658b6b4db3-S0 \
> -d resources='[
> {
> "name": "disk",
> "type": "SCALAR",
> "scalar": { "value": 1024 },
> "reservation": {
> "principal": "test"
> }
> }
> ]' \
> -X POST http://master:5050/master/reserve
HTTP/1.1 200 OK
{noformat}
The result is the same, if look at the output of"
{noformat}
http://master:5050/slaves
{noformat}
I won't see anything reserved:
{noformat}
reserved_resources_full": {},
{noformat}
and again in the logs on the one slave that is currently active I will see:
{noformat}
slave.cpp:2327] Updated checkpointed resources from to
{noformat}
and no further information either on the slave agent or the master.
Whether or not I specify a role doesn't have any effect:
{noformat}
curl -i \
-u test:test \
-d slaveId=af6e2f17-3d53-4656-a6ce-49658b6b4db3-S0 \
-d resources='[
{
"name": "disk",
"type": "SCALAR",
"scalar": { "value": 10000 },
"role": "test",
"disk": {
"source": {
"type" : "MOUNT",
"path" : { "root" : "/data" }
},
"persistence": {
"id" : "testing"
},
"volume": {
"mode": "RW",
"container_path": "/data"
}
}
}
]' \
-X POST http://master:5050/master/reserve
HTTP/1.1 200 OK
Date: Mon, 20 Jun 2016 21:32:17 GMT
Content-Length: 0
curl http://master:5050/slaves | jq '.' | grep full
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 590 100 590 0 0 158k 0 --:--:-- --:--:-- --:--:-- 192k
"reserved_resources_full": {},
"used_resources_full": [],
"offered_resources_full": []
{noformat}
was:
When attempting to create a persistent volume via the /create-volumes operator
endpoint. I get a HTTP 200 from the master and in the logs on the master I see:
{noformat}
http.cpp:312] HTTP POST for /master/create-volumes from "172.16.10.11:40686
with User-Agent='curl/7.29.0' "
{noformat}
then next line I see on the master is:
{noformat}
"master.cpp:6560] Sending checkpointed resources to slave
0ef7d2e1-8b0d-44d4-8db0-cc58ac2058af-S0 at slave(1)@172.16.10.4:5051"
{noformat}
Now if I look in the logs on the slave that was specified in the request to
create a persistent volume I see:
then on the slave I see:
{noformat}
"1572 slave.cpp:2327] Updated checkpointed resources from to "
{noformat}
Notice that from destination and a to destination are both missing
specifically, they should be the valueos of:
checkpointedResources and newCheckpointedResources, from here:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2582
I am currently running only one slave for troubleshooting purposes, the
resource file on the slave with the disk resource looks like the following:
#resources=file:///etc/default/mesos.resources.json
{noformat}
[
{
"name": "disk",
"type": "SCALAR",
"scalar": {
"value": 50000
}
},
{
"name":"disk",
"type":"SCALAR",
"scalar":{
"value":1000000
},
"role":"testing",
"disk":{
"source":{
"type":"MOUNT",
"mount":{
"root":"/data"
}
}
}
},
{
"name":"cpus",
"type":"SCALAR",
"scalar":{
"value":16
},
"role":"testing"
},
{
"name":"mem",
"type":"SCALAR",
"scalar":{
"value":128000
},
"role":"testing"
},
{
"name":"ports",
"type":"RANGES",
"ranges":{
"range":[
{
"begin":31000,
"end":32000
}
]
},
"role":"testing"
}
]
{noformat}
When I {{curl master:5050/slaves | jq '.'}} and look under the key
{{reserved_resources_full}}, I see the above resources on that slave.
Here is my request to via the operator endpoint {{/create-resources}}, I am
trying to create a persistent volume on the disk of type MOUNT above, which is
in {{/proc/mounts}} as {{/data}}:
{noformat}
curl -i -d slaveId=0ee7d2e7-8b0d-44d4-8d80-cc58ac2058ae-S4 \
-d volumes='[
{
"name": "testvol",
"type": "SCALAR",
"scalar": { "value": 10000 },
"role": "testing",
"disk": {
"source": {
"type" : "MOUNT",
"path" : { "root" : "/data" }
},
"persistence": {
"id" : "cliff"
},
"volume": {
"mode": "RW",
"container_path": "/data"
}
}
}
]' -X POST http://master:5050/master/create-volumes
{noformat}
{noformat}
HTTP/1.1 200 OK
Date: Sun, 19 Jun 2016 04:38:45 GMT
{noformat}
If look at the slave specified with slaveID above via:
{noformat}
curl - http://slave1:5051/state
{noformat}
I will not see the volume created. Also here are no errors in the INFO logs on
either the master or slave relating to this request. The only log entries are
those that I have provided.
The same problem/behavior seems to exist when trying creating persistent
volumes on dynamically reserved resources as well.
My steps were:
systemctl stop meso-slave
cd /var/mesos
rm -rf meta
systemctl start mesos-slave
then I issued the following to the /reserve operator endpoint:
{noformat}
curl -i \
-d slaveId=0ee7d2b7-7b0d-44d4-8d80-cc51ac2058ae-S0 \
-d resources='[
{
"name": "disk",
"type": "SCALAR",
"scalar": { "value": 10000 },
"disk": {
"source": {
"type" : "MOUNT",
"path" : { "root" : "/data" }
},
"persistence": {
"id" : "testing"
},
"volume": {
"mode": "RW",
"container_path": "/data"
}
}
}
]' \
-X POST http://master:5050/master/reserve
{noformat}
The volume will never get created, there will be no error logged anywhere on
the master or slave and I will only see the following on the slave, the same as
when attempting to create a persistent volume on statically defined resources:
{noformat}
5558 slave.cpp:2327] Updated checkpointed resources from to
{noformat}
I also tried enabling auth to rule out that possibly being a factor. Steps
taken:
{noformat}
/etc/default/mesos-master:
export authenticate_http=true
export credentials="/etc/default/credentials.json"
/etc/default/credentials.json
{
"credentials" : [
{
"principal": "test",
"secret": "test"
}
]
}
{noformat}
restart masters with "systemctl restart mesos-master"
{noformat}
# curl -i \
> -u test:test \
> -d slaveId=af6e2f17-3d53-4656-a6ce-49658b6b4db3-S0 \
> -d resources='[
> {
> "name": "disk",
> "type": "SCALAR",
> "scalar": { "value": 1024 },
> "reservation": {
> "principal": "test"
> }
> }
> ]' \
> -X POST http://master:5050/master/reserve
HTTP/1.1 200 OK
{noformat}
The result is the same, if look at the output of"
{noformat}
http://master:5050/slaves
{noformat}
I won't see anything reserved:
{noformat}
reserved_resources_full": {},
{noformat}
and again in the logs on the one slave that is currently active I will see:
{noformat}
slave.cpp:2327] Updated checkpointed resources from to
{noformat}
and no further information either on the slave agent or the master.
Whether or not I specify a role doesn't have any effect:
{noformat}
curl -i \
-u test:test \
-d slaveId=af6e2f17-3d53-4656-a6ce-49658b6b4db3-S0 \
-d resources='[
{
"name": "disk",
"type": "SCALAR",
"scalar": { "value": 10000 },
"role": "test",
"disk": {
"source": {
"type" : "MOUNT",
"path" : { "root" : "/data" }
},
"persistence": {
"id" : "testing"
},
"volume": {
"mode": "RW",
"container_path": "/data"
}
}
}
]' \
-X POST http://master:5050/master/reserve
HTTP/1.1 200 OK
Date: Mon, 20 Jun 2016 21:32:17 GMT
Content-Length: 0
curl http://slave1:5050/slaves | jq '.' | grep full
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 590 100 590 0 0 158k 0 --:--:-- --:--:-- --:--:-- 192k
"reserved_resources_full": {},
"used_resources_full": [],
"offered_resources_full": []
{noformat}
> Creating a persistent volume through the operator endpoints fail and doesn't
> produce meaningful logs.
> -----------------------------------------------------------------------------------------------------
>
> Key: MESOS-5653
> URL: https://issues.apache.org/jira/browse/MESOS-5653
> Project: Mesos
> Issue Type: Bug
> Components: master, volumes
> Affects Versions: 0.28.2
> Environment: Centos 7 - 3.10.0-327.13.1.el7.x86_64, Mesos 0.28.2
> Reporter: cliff
> Assignee: Greg Mann
> Labels: persistent-volumes
>
> When attempting to create a persistent volume via the /create-volumes
> operator endpoint. I get a HTTP 200 from the master and in the logs on the
> master I see:
> {noformat}
> http.cpp:312] HTTP POST for /master/create-volumes from "172.16.10.11:40686
> with User-Agent='curl/7.29.0' "
> {noformat}
> then next line I see on the master is:
> {noformat}
> "master.cpp:6560] Sending checkpointed resources to slave
> 0ef7d2e1-8b0d-44d4-8db0-cc58ac2058af-S0 at slave(1)@172.16.10.4:5051"
> {noformat}
> Now if I look in the logs on the slave that was specified in the request to
> create a persistent volume I see:
> then on the slave I see:
> {noformat}
> "1572 slave.cpp:2327] Updated checkpointed resources from to "
> {noformat}
> Notice that from destination and a to destination are both missing
> specifically, they should be the valueos of:
> checkpointedResources and newCheckpointedResources, from here:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2582
> I am currently running only one slave for troubleshooting purposes, the
> resource file on the slave with the disk resource looks like the following:
> #resources=file:///etc/default/mesos.resources.json
> {noformat}
> [
> {
> "name": "disk",
> "type": "SCALAR",
> "scalar": {
> "value": 50000
> }
> },
> {
> "name":"disk",
> "type":"SCALAR",
> "scalar":{
> "value":1000000
> },
> "role":"testing",
> "disk":{
> "source":{
> "type":"MOUNT",
> "mount":{
> "root":"/data"
> }
> }
> }
> },
> {
> "name":"cpus",
> "type":"SCALAR",
> "scalar":{
> "value":16
> },
> "role":"testing"
> },
> {
> "name":"mem",
> "type":"SCALAR",
> "scalar":{
> "value":128000
> },
> "role":"testing"
> },
> {
> "name":"ports",
> "type":"RANGES",
> "ranges":{
> "range":[
> {
> "begin":31000,
> "end":32000
> }
> ]
> },
> "role":"testing"
> }
> ]
> {noformat}
> When I {{curl master:5050/slaves | jq '.'}} and look under the key
> {{reserved_resources_full}}, I see the above resources on that slave.
> Here is my request to via the operator endpoint {{/create-resources}}, I am
> trying to create a persistent volume on the disk of type MOUNT above, which
> is in {{/proc/mounts}} as {{/data}}:
> {noformat}
> curl -i -d slaveId=0ee7d2e7-8b0d-44d4-8d80-cc58ac2058ae-S4 \
> -d volumes='[
> {
> "name": "testvol",
> "type": "SCALAR",
> "scalar": { "value": 10000 },
> "role": "testing",
> "disk": {
> "source": {
> "type" : "MOUNT",
> "path" : { "root" : "/data" }
> },
> "persistence": {
> "id" : "cliff"
> },
> "volume": {
> "mode": "RW",
> "container_path": "/data"
> }
> }
> }
> ]' -X POST http://master:5050/master/create-volumes
> {noformat}
>
> {noformat}
> HTTP/1.1 200 OK
> Date: Sun, 19 Jun 2016 04:38:45 GMT
> {noformat}
> If look at the slave specified with slaveID above via:
> {noformat}
> curl - http://slave1:5051/state
> {noformat}
> I will not see the volume created. Also here are no errors in the INFO logs
> on either the master or slave relating to this request. The only log entries
> are those that I have provided.
> The same problem/behavior seems to exist when trying creating persistent
> volumes on dynamically reserved resources as well.
> My steps were:
> systemctl stop meso-slave
> cd /var/mesos
> rm -rf meta
> systemctl start mesos-slave
> then I issued the following to the /reserve operator endpoint:
> {noformat}
> curl -i \
> -d slaveId=0ee7d2b7-7b0d-44d4-8d80-cc51ac2058ae-S0 \
> -d resources='[
> {
> "name": "disk",
> "type": "SCALAR",
> "scalar": { "value": 10000 },
> "disk": {
> "source": {
> "type" : "MOUNT",
> "path" : { "root" : "/data" }
> },
> "persistence": {
> "id" : "testing"
> },
> "volume": {
> "mode": "RW",
> "container_path": "/data"
> }
> }
> }
> ]' \
> -X POST http://master:5050/master/reserve
> {noformat}
> The volume will never get created, there will be no error logged anywhere on
> the master or slave and I will only see the following on the slave, the same
> as when attempting to create a persistent volume on statically defined
> resources:
> {noformat}
> 5558 slave.cpp:2327] Updated checkpointed resources from to
> {noformat}
> I also tried enabling auth to rule out that possibly being a factor. Steps
> taken:
> {noformat}
> /etc/default/mesos-master:
>
> export authenticate_http=true
> export credentials="/etc/default/credentials.json"
> /etc/default/credentials.json
> {
> "credentials" : [
> {
> "principal": "test",
> "secret": "test"
> }
> ]
> }
> {noformat}
> restart masters with "systemctl restart mesos-master"
> {noformat}
> # curl -i \
> > -u test:test \
> > -d slaveId=af6e2f17-3d53-4656-a6ce-49658b6b4db3-S0 \
> > -d resources='[
> > {
> > "name": "disk",
> > "type": "SCALAR",
> > "scalar": { "value": 1024 },
> > "reservation": {
> > "principal": "test"
> > }
> > }
> > ]' \
> > -X POST http://master:5050/master/reserve
> HTTP/1.1 200 OK
> {noformat}
> The result is the same, if look at the output of"
> {noformat}
> http://master:5050/slaves
> {noformat}
> I won't see anything reserved:
> {noformat}
> reserved_resources_full": {},
> {noformat}
> and again in the logs on the one slave that is currently active I will see:
> {noformat}
> slave.cpp:2327] Updated checkpointed resources from to
> {noformat}
> and no further information either on the slave agent or the master.
> Whether or not I specify a role doesn't have any effect:
> {noformat}
> curl -i \
> -u test:test \
> -d slaveId=af6e2f17-3d53-4656-a6ce-49658b6b4db3-S0 \
> -d resources='[
> {
> "name": "disk",
> "type": "SCALAR",
> "scalar": { "value": 10000 },
> "role": "test",
> "disk": {
> "source": {
> "type" : "MOUNT",
> "path" : { "root" : "/data" }
> },
> "persistence": {
> "id" : "testing"
> },
> "volume": {
> "mode": "RW",
> "container_path": "/data"
> }
> }
> }
> ]' \
> -X POST http://master:5050/master/reserve
> HTTP/1.1 200 OK
> Date: Mon, 20 Jun 2016 21:32:17 GMT
> Content-Length: 0
> curl http://master:5050/slaves | jq '.' | grep full
> % Total % Received % Xferd Average Speed Time Time Time
> Current
> Dload Upload Total Spent Left Speed
> 100 590 100 590 0 0 158k 0 --:--:-- --:--:-- --:--:-- 192k
> "reserved_resources_full": {},
> "used_resources_full": [],
> "offered_resources_full": []
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)