Re: mesos git commit: Added recommendations for programming with persistent volumes.

Neil Conway Tue, 19 Jan 2016 11:09:40 -0800

Hi Alex,

Good point. I added some docs for this behavior a few weeks ago:


https://github.com/apache/mesos/commit/6d0619e2e1fbf78411f881f431269539c7d24565

But that appears in a different doc page. You're probably right that
it is worth mentioning here as well -- I'll send a review shortly.

Neil

On Tue, Jan 19, 2016 at 12:48 AM, Alex R <ruklet...@gmail.com> wrote:
> One more caveat here is when there are multiple frameworks in the role: one
> framework may successfully reserve certain resources but they will be
> offered to another framework in the role. Do you think it's worth
> mentioning this use case in the doc?
>
> On 18 January 2016 at 23:30, <jo...@apache.org> wrote:
>
>> Repository: mesos
>> Updated Branches:
>>   refs/heads/master 12455d0d0 -> e2963966a
>>
>>
>> Added recommendations for programming with persistent volumes.
>>
>> Added recommendations for programming with persistent volumes.
>>
>> Review: https://reviews.apache.org/r/41952/
>>
>>
>> Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
>> Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/e2963966
>> Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/e2963966
>> Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/e2963966
>>
>> Branch: refs/heads/master
>> Commit: e2963966acc5c2263849ef183c9ee57251102d0e
>> Parents: 12455d0
>> Author: Neil Conway <neil.con...@gmail.com>
>> Authored: Mon Jan 18 17:30:19 2016 -0500
>> Committer: Joris Van Remoortere <joris.van.remoort...@gmail.com>
>> Committed: Mon Jan 18 17:30:19 2016 -0500
>>
>> ----------------------------------------------------------------------
>>  docs/persistent-volume.md | 66 ++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 66 insertions(+)
>> ----------------------------------------------------------------------
>>
>>
>>
>> http://git-wip-us.apache.org/repos/asf/mesos/blob/e2963966/docs/persistent-volume.md
>> ----------------------------------------------------------------------
>> diff --git a/docs/persistent-volume.md b/docs/persistent-volume.md
>> index f969975..4af7d6e 100644
>> --- a/docs/persistent-volume.md
>> +++ b/docs/persistent-volume.md
>> @@ -334,3 +334,69 @@ The user receives one of the following HTTP responses:
>>
>>  Note that a single `/destroy-volumes` request can destroy multiple
>> persistent
>>  volumes, but all of the volumes must be on the same slave.
>> +
>> +### Programming with Persistent Volumes
>> +
>> +Some suggestions to keep in mind when building applications that use
>> persistent
>> +volumes:
>> +
>> +* A single `acceptOffers` call can be used to both create a new dynamic
>> +  reservation (via `Offer::Operation::Reserve`) and create a new
>> persistent
>> +  volume on those newly reserved resources (via
>> `Offer::Operation::Create`).
>> +
>> +* Attempts to dynamically reserve resources or create persistent volumes
>> might
>> +  fail---for example, because the network message containing the
>> operation did
>> +  not reach the master or because the master rejected the operation.
>> +  Applications should be prepared to detect failures and correct for them
>> (e.g.,
>> +  by retrying the operation).
>> +
>> +* When using HTTP endpoints to reserve resources or create persistent
>> volumes,
>> +  _some_ failures can be detected by examining the HTTP response code
>> returned
>> +  to the client. However, it is still possible for a `200` response code
>> to be
>> +  returned to the client but for the associated operation to fail.
>> +
>> +* When using the scheduler API, detecting that a dynamic reservation has
>> failed
>> +  is a little tricky: reservations do not have unique identifiers, and
>> the Mesos
>> +  master does not provide explicit feedback on whether a reservation
>> request has
>> +  succeeded or failed. Hence, framework schedulers typically use a
>> combination
>> +  of two techniques:
>> +
>> +  1. They use timeouts to detect that a reservation request may have
>> failed
>> +     (because they don't receive a resource offer containing the expected
>> +     resources after a given period of time).
>> +
>> +  2. To check whether a resource offer includes the effect of a dynamic
>> +     reservation, applications _cannot_ check for the presence of a
>> "reservation
>> +     ID" or similar value (because reservations do not have IDs). Instead,
>> +     applications should examine the resource offer and check it contains
>> +     sufficient reserved resources for the application's role. If it does
>> not,
>> +     the application should make additional reservation requests as
>> necessary.
>> +
>> +* When a scheduler issues a dynamic reservation request, the reserved
>> resources
>> +  might _not_ be present in the next resource offer the scheduler
>> receives.
>> +  There are two reasons for this: first, the reservation request might
>> fail or
>> +  be dropped by the network, as discussed above. Second, the reservation
>> request
>> +  might simply be delayed, so that the next resource offer from the
>> master will
>> +  be issued before the reservation request is received by the master.
>> This is
>> +  why the text above suggests that applications wait for a timeout before
>> +  assuming that a reservation request should be retried.
>> +
>> +* A consequence of using timeouts to detect failures is that an
>> application
>> +  might submit more reservation requests than intended (e.g., a timeout
>> fires
>> +  and an application makes another reservation request; meanwhile, the
>> original
>> +  reservation request is also processed). Recall that two reservations
>> for the
>> +  same role at the same agent are "merged": for example, role `foo` makes
>> two
>> +  requests to reserve 2 CPUs at a single agent and both reservation
>> requests
>> +  succeed, the result will be a single reservation of 4 CPUs. To handle
>> this
>> +  situation, applications should be prepared for resource offers that
>> contain
>> +  more resources than expected. Some applications may also want to detect
>> this
>> +  situation and unreserve an additional reserved resources that will not
>> be
>> +  required.
>> +
>> +* It often makes sense to structure application logic as a "state
>> machine",
>> +  where the application moves from its initial state (no reserved
>> resources and
>> +  no persistent volumes) and eventually transitions toward a single
>> terminal
>> +  state (necessary resources reserved and persistent volume created). As
>> new
>> +  events (such as timeouts and resource offers) are received, the
>> application
>> +  compares the event with its current state and decides what action to
>> take
>> +  next.
>>
>>

Re: mesos git commit: Added recommendations for programming with persistent volumes.

Reply via email to