Omar AitMous created MESOS-9476:
-----------------------------------

             Summary: XFS project IDs aren't released upon task completion
                 Key: MESOS-9476
                 URL: https://issues.apache.org/jira/browse/MESOS-9476
             Project: Mesos
          Issue Type: Bug
          Components: agent
    Affects Versions: 1.7.0
            Reporter: Omar AitMous
         Attachments: Vagrantfile, build.sh

The XFS isolation doesn't release project IDs when a task finishes on Mesos 1.7 
(branch 1.7.x), and once all project IDs are taken, scheduling new tasks fails 
with:

`{color:#FF0000}Failed to assign project ID, range exhausted{color}`

 

Attached is a vagrant configuration that sets up a VM with an XFS disk (mounted 
on /var/opt/mesos), zookeeper 3.4.12, mesos 1.7 and marathon 1.6.

Once the box is ready, start zookeeper, mesos-master, mesos-agent (using the 
XFS disk) and marathon:
 * sudo bin/zkServer.sh start
 * sudo /home/vagrant/mesos/build/bin/mesos-master.sh --ip=192.168.33.10 
--work_dir=/mnt/mesos
 * sudo /home/vagrant/mesos/build/bin/mesos-agent.sh 
--master=192.168.33.10:5050 --work_dir=/var/opt/mesos 
--enforce_container_disk_quota --isolation=disk/xfs 
--xfs_project_range=[5000-5009]
 * sudo 
MESOS_NATIVE_JAVA_LIBRARY="/home/vagrant/mesos/build/src/.libs/libmesos.so" sbt 
'run --master 192.168.33.10:5050 --zk zk://localhost:2181/marathon'

 

Create an app on marathon, for example:

{"id": "/test", "cmd": "sleep 3600", "cpus": 0.01, "mem": 32, "disk": 1, 
"instances": 5}

You should see 5 project IDs being used:

$ sudo xfs_quota -x -c "report -a -n -L 5000 -U 5009" | grep '^#[1-9][0-9]*'
#5000 4 1024 1024 00 [--------]
#5001 4 1024 1024 00 [--------]
#5002 4 1024 1024 00 [--------]
#5003 4 1024 1024 00 [--------]
#5004 4 1024 1024 00 [--------]

 

If you scale down to 0 instances, the project IDs aren't released.

If you scale back up to 8 instances, only 5 of them will start, the remaining 3 
will fail with errors like this:

E1213 14:38:36.190430 20813 slave.cpp:6204] Container 
'064b8a6b-c42d-4905-b2a7-632318aa2b83' for executor 
'test.c5e88a67-fee4-11e8-9cc6-0800278a1a98' of framework 
0473e272-04f7-4b1d-ae1d-f7177940e295-0000 failed to start: Failed to assign 
project ID, range exhausted

 

On Mesos 1.4, the project IDs are properly released when a task finishes.

(I haven't tested other versions)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to