Re: Review Request 28809: Started to maintain and checkpoint persisted resource in slave.

Ben Mahler Wed, 07 Jan 2015 16:29:10 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28809/#review67131
-----------------------------------------------------------




src/slave/slave.cpp
<https://reviews.apache.org/r/28809/#comment110970>

    Just so I understand, we won't be deleting anything, right? We'll leave 
volumes "dangling" only when the master fails over during the issue you 
described here. In that case, the master also thinks that there are no 
persisted resources on the slave.
    
    And we won't delete the unknown volumes, we'll leave them dangling on the 
filesystem, right?. Can you please file a ticket to capture this issue and link 
it here and in the epic? Don't want to leave it unfixed.
    
    If the master is up, it should re-send the persistent resources at which 
point the slave gets them back during re-registration, right?. During 
registration however, the problem still exists, right?
    
    Let's document the details here and/or in a ticket!



src/slave/slave.cpp
<https://reviews.apache.org/r/28809/#comment110972>

    What about adding a method in slave/state.hpp for checkpointing Resources?
    
    ```
    Try<Nothing> checkpoint(
        const std::string& path,
        const Resources& resources);
    ```



src/slave/slave.cpp
<https://reviews.apache.org/r/28809/#comment110973>

    Maybe a little NOTE here that we assume that messages are ordered for the 
releasing to be correct? And that ordering is technically not guaranteed, maybe 
pointing to the relevant tickets?
    
    Just want to make sure we know what needs to be done to prevent this from 
ever biting us.



src/slave/slave.cpp
<https://reviews.apache.org/r/28809/#comment110974>

    "Updated persistent resources to Y"?
    
    Or
    
    "Updated persistent resources from X to Y"?



src/slave/slave.cpp
<https://reviews.apache.org/r/28809/#comment110977>

    Shouldn't this CHECK exist in updateResources instead of here? Or in both 
places?
    
    Otherwise the slave will create a situation where it will fail CHECKs when 
it next recovers, is there something else I'm missing?


- Ben Mahler


On Dec. 8, 2014, 10:19 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28809/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2014, 10:19 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-2031
>     https://issues.apache.org/jira/browse/MESOS-2031
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Started to maintain and checkpoint persisted resource in slave. That includes:
> 1) responds to update resources message
> 2) checkpoint resources
> 3) recover checkpointed resources
> 4) send checkpointed resources during register/reregister
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 70bd8c1fde4ea09fa54c76aa93424a1adb0309f6 
>   src/slave/slave.cpp 9ac64589c353b2f17f538db7de01faa55b2369b9 
> 
> Diff: https://reviews.apache.org/r/28809/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>

Re: Review Request 28809: Started to maintain and checkpoint persisted resource in slave.

Reply via email to