[ 
https://issues.apache.org/jira/browse/MESOS-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877190#comment-16877190
 ] 

Greg Mann commented on MESOS-9875:
----------------------------------

Perhaps we can fix this in the short-term by simply moving the 
{{updateOperation()}} call after the call to {{checkpointResourceState()}}… 
although with current agent behavior, this would result in the agent crashing, 
then reconciling with master, and the scheduler would receive an 
{{OPERATION_DROPPED}} update for that operation, which isn’t accurate (but 
better than {{FINISHED}} I would say).

I think our current code isn’t going to handle this type of operation failure 
well; rather than crashing when checkpointing fails, I think we could simply 
send an {{OPERATION_FAILED}} update and allow the agent to continue running.

> Mesos did not respond correctly when operations should fail
> -----------------------------------------------------------
>
>                 Key: MESOS-9875
>                 URL: https://issues.apache.org/jira/browse/MESOS-9875
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yifan Xing
>            Priority: Major
>
> For testing persistent volumes with {{OPERATION_FAILED/ERROR}} feedbacks, we 
> sshed into the mesos-agent and made it unable to create subdirectories in 
> {{/srv/mesos/work/volumes}}, however, mesos did not respond any operation 
> failed response. Instead, we received {{OPERATION_FINISHED}} feedback.
> Steps to recreate the issue:
> 1. Ssh into a magent.
> 2. Make it impossible to create a persistent volume (we expect the agent to 
> crash and reregister, and the master to release that the operation is 
> {{OPERATION_DROPPED}}):
> * cd /srv/mesos/work (if it doesn't exist mkdir /srv/mesos/work/volumes)
> * chattr -RV +i volumes (then no subdirectories can be created)
> 3. Launch a service with persistent volumes with the constraint of only using 
> the magent modified above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to