[jira] [Commented] (MESOS-9476) XFS project IDs aren't released upon task completion

Ilya Pronin (JIRA) Thu, 13 Dec 2018 14:08:43 -0800


    [ 
https://issues.apache.org/jira/browse/MESOS-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720643#comment-16720643
 ]


Ilya Pronin commented on MESOS-9476:
------------------------------------

This change was introduced in 1.7. The isolator now periodically (every 
{{\-\-disk_watch_interval}}) checks which container sandboxes and persistent 
volumes were removed (e.g. by disk GC) and reclaims their project IDs. The main 
reason for doing so was the fact that project IDs cannot be removed from 
symlinks, which may lead to weird accounting. Also, currently isolators don't 
get notified when a persistent volume is removed, to {{disk/xfs}} can only do 
periodic scans to reclaim volume project IDs. See MESOS-5158 and MESOS-9007 for 
more information.

XFS project IDs are 16 or 32 bit integers, usually there should be plenty of 
them available. Can you give your Mesos agents a larger ID range?

> XFS project IDs aren't released upon task completion
> ----------------------------------------------------
>
>                 Key: MESOS-9476
>                 URL: https://issues.apache.org/jira/browse/MESOS-9476
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.7.0
>         Environment: Centos 7.1
> Mesos 1.7
>            Reporter: Omar AitMous
>            Priority: Major
>         Attachments: Vagrantfile, build.sh
>
>
> The XFS isolation doesn't release project IDs when a task finishes on Mesos 
> 1.7 (branch 1.7.x), and once all project IDs are taken, scheduling new tasks 
> fails with:
> {code:java}
> Failed to assign project ID, range exhausted
> {code}
>  
> Attached is a vagrant configuration that sets up a VM with an XFS disk 
> (mounted on /var/opt/mesos), zookeeper 3.4.12, mesos 1.7 and marathon 1.6.
> Once the box is ready, start zookeeper, mesos-master, mesos-agent (using the 
> XFS disk) and marathon:
> {code:java}
> sudo bin/zkServer.sh start
> sudo /home/vagrant/mesos/build/bin/mesos-master.sh --ip=192.168.33.10 
> --work_dir=/mnt/mesos
> sudo /home/vagrant/mesos/build/bin/mesos-agent.sh --master=192.168.33.10:5050 
> --work_dir=/var/opt/mesos --enforce_container_disk_quota --isolation=disk/xfs 
> --xfs_project_range=[5000-5009]
> sudo 
> MESOS_NATIVE_JAVA_LIBRARY="/home/vagrant/mesos/build/src/.libs/libmesos.so" 
> sbt 'run --master 192.168.33.10:5050 --zk zk://localhost:2181/marathon'
> {code}
>  
> Create an app on marathon, for example:
> {code:java}
> {"id": "/test", "cmd": "sleep 3600", "cpus": 0.01, "mem": 32, "disk": 1, 
> "instances": 5}  
> {code}
>  
> You should see 5 project IDs being used:
> {code:java}
> $ sudo xfs_quota -x -c "report -a -n -L 5000 -U 5009" | grep '^#[1-9][0-9]*'
> #5000 4 1024 1024 00 [--------]
> #5001 4 1024 1024 00 [--------]
> #5002 4 1024 1024 00 [--------]
> #5003 4 1024 1024 00 [--------]
> #5004 4 1024 1024 00 [--------]
> {code}
>  
> If you scale down to 0 instances, the project IDs aren't released.
> If you scale back up to 8 instances, only 5 of them will start, the remaining 
> 3 will fail with errors like this:
> {code:java}
> E1213 14:38:36.190430 20813 slave.cpp:6204] Container 
> '064b8a6b-c42d-4905-b2a7-632318aa2b83' for executor 
> 'test.c5e88a67-fee4-11e8-9cc6-0800278a1a98' of framework 
> 0473e272-04f7-4b1d-ae1d-f7177940e295-0000 failed to start: Failed to assign 
> project ID, range exhausted
> {code}
>  
> I've tested on Mesos 1.4, the project IDs are properly released when the task 
> finishes.
> (I haven't tested other versions)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (MESOS-9476) XFS project IDs aren't released upon task completion

Reply via email to