Added: mesos/site/source/documentation/latest/scheduler-http-api.md
URL: 
http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/scheduler-http-api.md?rev=1707725&view=auto
==============================================================================
--- mesos/site/source/documentation/latest/scheduler-http-api.md (added)
+++ mesos/site/source/documentation/latest/scheduler-http-api.md Fri Oct  9 
13:33:18 2015
@@ -0,0 +1,519 @@
+---
+layout: documentation
+---
+
+# Scheduler HTTP API
+
+Mesos 0.24.0 added **experimental** support for v1 Scheduler HTTP API.
+
+
+## Overview
+
+The scheduler interacts with Mesos via  “/api/v1/scheduler” endpoint 
hosted by the Mesos master. The fully qualified URL of the endpoint might look 
like:
+
+       http://masterhost:5050/api/v1/scheduler
+
+Note that we refer to this endpoint with its suffix "/scheduler" in the rest 
of this document. This endpoint accepts HTTP POST requests with data encoded as 
JSON (Content-Type: application/json) or binary Protobuf (Content-Type: 
application/x-protobuf). The first request that a scheduler sends to 
“/scheduler” endpoint is called SUBSCRIBE and results in a streaming 
response (“200 OK” status code with Transfer-Encoding: chunked). 
**Schedulers are expected to keep the subscription connection open as long as 
possible (barring errors in network, software, hardware etc.) and incrementally 
process the response** (NOTE: HTTP client libraries that can only parse the 
response after the connection is closed cannot be used). For the encoding used, 
please refer to **Events** section below.
+
+All the subsequent (non subscribe) requests to “/scheduler” endpoint (see 
details below in **Calls** section) must be sent using a different 
connection(s) than the one being used for subscription. Master responds to 
these HTTP POST requests with “202 Accepted” status codes (or, for 
unsuccessful requests, with 4xx or 5xx status codes; details in later 
sections). The “202 Accepted” response means that a request has been 
accepted for processing, not that the processing of the request has been 
completed. The request might or might not be acted upon by Mesos (e.g., master 
fails during the processing of the request). Any asynchronous responses from 
these requests will be streamed on long-lived subscription connection.
+
+
+## Calls
+
+The following calls are currently accepted by the master. The canonical source 
of this information is 
[scheduler.proto](https://github.com/apache/mesos/blob/master/include/mesos/v1/scheduler/scheduler.proto)
 (NOTE: The protobuf definitions are subject to change before the beta API is 
finalized). Note that when sending JSON encoded Calls, schedulers should encode 
raw bytes in Base64 and strings in UTF-8.
+
+### SUBSCRIBE
+
+This is the first step in the communication process between the scheduler and 
the master. This is also to be considered as subscription to the 
“/scheduler” events stream.
+
+To subscribe with the master, the scheduler sends a HTTP POST request with 
encoded  `SUBSCRIBE` message with the required FrameworkInfo. Note that if 
"subscribe.framework_info.id" is not set, master considers the scheduler as a 
new one and subscribes it by assigning it a FrameworkID. The HTTP response is a 
stream with RecordIO encoding, with the first event being `SUBSCRIBED` event 
(see details in **Events** section).
+
+```
+SUBSCRIBE Request (JSON):
+
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+Accept: application/json
+Connection: close
+
+{
+   “type”          : “SUBSCRIBE”,
+
+   “subscribe”     : {
+      “framework_info”     : {
+        “user” :  “foo”,
+        “name” :  “Example HTTP Framework”
+      },
+
+      “force” : true
+  }
+}
+
+SUBSCRIBE Response Event (JSON):
+HTTP/1.1 200 OK
+
+Content-Type: application/json
+Transfer-Encoding: chunked
+
+<event length>
+{
+ “type”                    : “SUBSCRIBED”,
+ “subscribed”      : {
+     “framework_id”               : 
{“value”:“12220-3440-12532-2345”},
+     "heartbeat_interval_seconds" : 15
+  }
+}
+<more events>
+```
+
+Alternatively, if “subscribe.framework_info.id” is set, master considers 
this a request from an already subscribed scheduler reconnecting after a 
disconnection (e.g., due to failover or network disconnection) and responds 
with `SUBSCRIBED` event containing the same FrameworkID. The 
“subscribe.force” field describes how the master reacts when multiple 
scheduler instances (with the same framework id) attempt to subscribe with the 
master at the same time (e.g., due to network partition). See the semantics in 
**Disconnections** section below.
+
+NOTE: In the old version of the API, (re-)registered callbacks also included 
MasterInfo, which contained information about the master the driver currently 
connected to. With the new API, since schedulers explicitly subscribe with the 
leading master (see details below in **Master Detection** section), it’s not 
relevant anymore.
+
+If subscription fails for whatever reason (e.g., invalid request), a HTTP 4xx 
response is returned with the error message as part of the body and the 
connection is closed.
+
+Scheduler must make additional HTTP requests to the “/scheduler” endpoint 
only after it has opened a persistent connection to it by sending a `SUBSCRIBE` 
request and received a `SUBSCRIBED` response. Calls made without subscription 
will result in a “403 Forbidden“ instead of a “202 Accepted“ response. 
A scheduler might also receive a “400 Bad Request” response if the HTTP 
request is malformed (e.g., malformed HTTP headers).
+
+### TEARDOWN
+Sent by the scheduler when it wants to tear itself down. When Mesos receives 
this request it will shut down all executors (and consequently kill tasks) and 
remove persistent volumes (if requested). It then removes the framework and 
closes all open connections from this scheduler to the Master.
+
+```
+TEARDOWN Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”   : {“value” : “12220-3440-12532-2345”},
+  “type”                   : “TEARDOWN”,
+}
+
+TEARDOWN Response:
+HTTP/1.1 202 Accepted
+```
+
+### ACCEPT
+Sent by the scheduler when it accepts offer(s) sent by the master. The 
`ACCEPT` request includes the type of operations (e.g., launch task, reserve 
resources, create volumes) that the scheduler wants to perform on the offers. 
Note that until the scheduler replies (accepts or declines) to an offer, its 
resources are considered allocated to the framework. Also, any of the offer’s 
resources not used in the `ACCEPT` call (e.g., to launch a task) are considered 
declined and might be reoffered to other frameworks. In other words, the same 
`OfferID` cannot be used in more than one `ACCEPT` call. These semantics might 
change when we add new features to Mesos (e.g., persistence, reservations, 
optimistic offers, resizeTask, etc.).
+
+```
+ACCEPT Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”   : {“value” : “12220-3440-12532-2345”},
+  “type”                   : “ACCEPT”,
+  “accept”                 : {
+    “offer_ids”            : [
+                       {“value” : “12220-3440-12532-O12”},
+                       {“value” : “12220-3440-12532-O12”}
+                      ],
+    “operations”   : [ {“type” : “LAUNCH”, “launch” : {...}} ],
+    “filters”              : {...}
+  }
+}
+
+ACCEPT Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### DECLINE
+Sent by the scheduler to explicitly decline offer(s) received. Note that this 
is same as sending an `ACCEPT` call with no operations.
+
+```
+DECLINE Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”   : {“value” : “12220-3440-12532-2345”},
+  “type”                   : “DECLINE”,
+  “decline”                        : {
+    “offer_ids”    : [
+                   {“value” : “12220-3440-12532-O12”},
+                   {“value” : “12220-3440-12532-O13”}
+                  ],
+    “filters”      : {...}
+  }
+}
+
+DECLINE Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### REVIVE
+Sent by the scheduler to remove any/all filters that it has previously set via 
`ACCEPT` or `DECLINE` calls.
+
+```
+REVIVE Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”   : {“value” : “12220-3440-12532-2345”},
+  “type”                   : “REVIVE”,
+}
+
+REVIVE Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### KILL
+Sent by the scheduler to kill a specific task. If the scheduler has a custom 
executor, the kill is forwarded to the executor and it is up to the executor to 
kill the task and send a `TASK_KILLED` (or `TASK_FAILED`) update. Mesos 
releases the resources for a task once it receives a terminal update for the 
task. If the task is unknown to the master, a `TASK_LOST` will be generated.
+
+```
+KILL Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”   : {“value” : “12220-3440-12532-2345”},
+  “type”                   : “KILL”,
+  “kill”                   : {
+    “task_id”      :  {“value” : “12220-3440-12532-my-task”},
+    “agent_id”     :  {“value” : “12220-3440-12532-S1233”}
+  }
+}
+
+KILL Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### SHUTDOWN
+Sent by the scheduler to shutdown a specific custom executor (NOTE: This is a 
new call that was not present in the old API). When an executor gets a shutdown 
event, it is expected to kill all its tasks (and send `TASK_KILLED` updates) 
and terminate. If an executor doesn’t terminate within a certain timeout 
(configurable via  “--executor_shutdown_grace_period” agent flag), the 
agent will forcefully destroy the container (executor and its tasks) and 
transitions its active tasks to `TASK_LOST`.
+
+```
+SHUTDOWN Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”   : {“value” : “12220-3440-12532-2345”},
+  “type”                   : “SHUTDOWN”,
+  “shutdown”               : {
+    “executor_id”  :  {“value” : “123450-2340-1232-my-executor”},
+    “agent_id”             :  {“value” : “12220-3440-12532-S1233”}
+  }
+}
+
+SHUTDOWN Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### ACKNOWLEDGE
+Sent by the scheduler to acknowledge a status update. Note that with the new 
API, schedulers are responsible for explicitly acknowledging the receipt of 
status updates that have “status.uuid()” set. These status updates are 
reliably retried until they are acknowledged by the scheduler. The scheduler 
must not acknowledge status updates that do not have “status.uuid()” set as 
they are not retried. "uuid" is raw bytes encoded in Base64.
+
+```
+ACKNOWLEDGE Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”   : {“value” : “12220-3440-12532-2345”},
+  “type”                   : “ACKNOWLEDGE”,
+  “acknowledge”            : {
+    “agent_id”     :  {“value” : “12220-3440-12532-S1233”},
+    “task_id”      :  {“value” : “12220-3440-12532-my-task”},
+    “uuid”         :  “jhadf73jhakdlfha723adf”
+  }
+}
+
+ACKNOWLEDGE Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### RECONCILE
+Sent by the scheduler to query the status of non-terminal tasks. This causes 
the master to send back `UPDATE` events for each task in the list. Tasks that 
are no longer known to Mesos will result in `TASK_LOST` updates. If the list of 
tasks is empty, master will send `UPDATE` events for all currently known tasks 
of the framework.
+
+```
+RECONCILE Request (JSON):
+POST /api/v1/scheduler   HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”   : {“value” : “12220-3440-12532-2345”},
+  “type”                   : “RECONCILE”,
+  “reconcile”              : {
+    “tasks”                : [
+                   { “task_id”  : { “value” : “312325” },
+                     “agent_id” : { “value” : “123535” }
+                   }
+                  ]
+  }
+}
+
+RECONCILE Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### MESSAGE
+Sent by the scheduler to send arbitrary binary data to the executor. Note that 
Mesos neither interprets this data nor makes any guarantees about the delivery 
of this message to the executor. "data" is raw bytes encoded in Base64.
+
+```
+MESSAGE Request (JSON):
+POST /api/v1/scheduler   HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”   : {“value” : “12220-3440-12532-2345”},
+  “type”                   : “MESSAGE”,
+  “message”                        : {
+    “agent_id”       : {“value” : “12220-3440-12532-S1233”},
+    “executor_id”    : {“value” : “my-framework-executor”},
+    “data”           : “adaf838jahd748jnaldf”
+  }
+}
+
+MESSAGE Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### REQUEST
+Sent by the scheduler to request resources from the master/allocator. The 
built-in hierarchical allocator simply ignores this request but other 
allocators (modules) can interpret this in a customizable fashion.
+
+```
+Request (JSON):
+POST /api/v1/scheduler   HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”   : {“value” : “12220-3440-12532-2345”},
+  “type”                   : “REQUEST”,
+  “requests”               : [
+      {
+         “agent_id”       : {“value” : “12220-3440-12532-S1233”},
+         “resources”      : {}
+      },
+  ]
+}
+
+REQUEST Response:
+HTTP/1.1 202 Accepted
+
+```
+
+## Events
+
+Scheduler is expected to keep a **persistent** connection open to 
“/scheduler” endpoint even after getting a SUBSCRIBED HTTP Response event. 
This is indicated by “Connection: keep-alive” and “Transfer-Encoding: 
chunked” headers with *no* “Content-Length” header set. All subsequent 
events that are relevant to this framework  generated by Mesos are streamed on 
this connection. Master encodes each Event in RecordIO format, i.e., string 
representation of length of the event in bytes followed by JSON or binary 
Protobuf  (possibly compressed) encoded event. Note that the value of length 
will never be ‘0’ and the size of the length will be the size of unsigned 
integer (i.e., 64 bits). Also, note that the RecordIO encoding should be 
decoded by the scheduler whereas the underlying HTTP chunked encoding is 
typically invisible at the application (scheduler) layer. The type of content 
encoding used for the events will be determined by the accept
  header of the POST request (e.g., Accept: application/json).
+
+The following events are currently sent by the master. The canonical source of 
this information is at 
[scheduler.proto](include/mesos/v1/scheduler/scheduler.proto). Note that when 
sending JSON encoded events, master encodes raw bytes in Base64 and strings in 
UTF-8.
+
+### SUBSCRIBED
+The first event sent by the master when the scheduler sends a `SUBSCRIBE` 
request on the persistent connection. See `SUBSCRIBE` in Calls section for the 
format.
+
+
+### OFFERS
+Sent by the master whenever there are new resources that can be offered to the 
framework. Each offer corresponds to a set of resources on a agent. Until the 
scheduler 'Accept's or 'Decline's an offer the resources are considered 
allocated to the scheduler, unless the offer is otherwise rescinded, e.g. due 
to a lost agent or `--offer_timeout`.
+
+```
+OFFERS Event (JSON)
+<event-length>
+{
+  “type”   : “OFFERS”,
+  “offers” : [
+    {
+      “offer_id”:{“value”: “12214-23523-O235235”},
+      “framework_id”:{“value”: “12124-235325-32425”},
+      “agent_id”:{“value”: “12325-23523-S23523”},
+      “hostname”:“agent.host”,
+      “resources”:[...],
+      “attributes”:[...],
+      “executor_ids”:[]
+    }
+  ]
+}
+```
+
+### RESCIND
+Sent by the master when a particular offer is no longer valid (e.g., the agent 
corresponding to the offer has been removed) and hence needs to be rescinded. 
Any future calls (`ACCEPT` / `DECLINE`) made by the scheduler regarding this 
offer will be invalid.
+
+```
+RESCIND Event (JSON)
+<event-length>
+{
+  “type”   : “RESCIND”,
+  “rescind”        : {
+    “offer_id”     : { “value” : “12214-23523-O235235”}
+  }
+}
+```
+
+### UPDATE
+Sent by the master whenever there is a status update that is generated by the 
executor, agent or master. Status updates should be used by executors to 
reliably communicate the status of the tasks that they manage. It is crucial 
that a terminal update (e.g., `TASK_FINISHED`, `TASK_KILLED`, `TASK_FAILED`) is 
sent by the executor as soon as the task terminates, in order for Mesos to 
release the resources allocated to the task. It is also the responsibility of 
the scheduler to explicitly acknowledge the receipt of status updates that are 
reliably retried. See `ACKNOWLEDGE` in the Calls section above for the 
semantics. Note that `uuid` and `data` are raw bytes encoded in Base64.
+
+
+```
+UPDATE Event (JSON)
+
+<event-length>
+{
+  “type”   : “UPDATE”,
+  “update” : {
+    “status”       : {
+        “task_id”  : { “value” : “12344-my-task”},
+        “state”            : “TASK_RUNNING”,
+        “source”   : “SOURCE_EXECUTOR”,
+        “uuid”             : “adfadfadbhgvjayd23r2uahj”,
+        "bytes"                : "uhdjfhuagdj63d7hadkf"
+
+      }
+  }
+}
+```
+
+### MESSAGE
+A custom message generated by the executor that is forwarded to the scheduler 
by the master. Note that this message is not interpreted by Mesos and is only 
forwarded (without reliability guarantees) to the scheduler. It is up to the 
executor to retry if the message is dropped  for any reason. Note that `data` 
is raw bytes encoded as Base64.
+
+```
+MESSAGE Event (JSON)
+
+<event-length>
+{
+  “type”   : “MESSAGE”,
+  “message”        : {
+    “agent_id”             : { “value” : “12214-23523-S235235”},
+    “executor_id”  : { “value” : “12214-23523-my-executor”},
+    “data”                 : “adfadf3t2wa3353dfadf”
+  }
+}
+```
+
+
+### FAILURE
+Sent by the master when a agent is removed from the cluster (e.g., failed 
health checks) or when an executor is terminated. Note that, this event 
coincides with receipt of terminal `UPDATE` events for any active tasks 
belonging to the agent or executor and receipt of `RESCIND` events for any 
outstanding offers belonging to the agent. Note that there is no guaranteed 
order between the `FAILURE`, `UPDATE` and `RESCIND` events.
+
+```
+FAILURE Event (JSON)
+
+<event-length>
+{
+  “type”   : “FAILURE”,
+  “failure”        : {
+    “agent_id”             : { “value” : “12214-23523-S235235”},
+    “executor_id”  : { “value” : “12214-23523-my-executor”},
+    “status”               : 1
+  }
+}
+```
+
+### ERROR
+Sent by the master when an asynchronous error event is generated (e.g., a 
framework is not authorized to subscribe with the given role). It is 
recommended that the framework abort when it receives an error and retry 
subscription as necessary.
+
+```
+ERROR Event (JSON)
+
+<event-length>
+{
+  “type”   : “ERROR”,
+  “message”        : “Framework is not authorized”
+}
+```
+
+### HEARTBEAT
+This event is periodically sent by the master to inform the scheduler that a 
connection is alive. This also helps ensure that network intermediates do not 
close the persistent subscription connection due to lack of data flow. See the 
next section on how a scheduler can use this event to deal with network 
partitions.
+
+```
+HEARTBEAT Event (JSON)
+
+<event-length>
+{
+  “type”   : “HEARTBEAT”,
+}
+```
+
+## Disconnections
+
+Master considers a scheduler disconnected if the persistent subscription 
connection (opened via `SUBSCRIBE` request) to “/scheduler” breaks. The 
connection could break for several reasons, e.g., scheduler restart, scheduler 
failover, network error. Note that the master doesn’t keep track of 
non-subscription connection(s) to
+“/scheduler” because it is not expected to be a persistent connection.
+
+If master realizes that the subscription connection is broken, it marks the 
scheduler as “disconnected” and starts a failover timeout (failover timeout 
is part of FrameworkInfo). It also drops any pending events in its queue. 
Additionally, it rejects subsequent non-subscribe HTTP requests to 
“/scheduler” with “403 Forbidden”, until the scheduler subscribes again 
with “/scheduler”. If the scheduler *does not* re-subscribe within the 
failover timeout, the master considers the scheduler gone forever and shuts 
down all its executors, thus killing all its tasks. Therefore, all production 
schedulers are recommended to use a high value (e.g., 4 weeks) for the failover 
timeout.
+
+NOTE: To force shutdown a framework before the framework timeout elapses 
(e.g., during framework development and testing), either the framework can send 
`TEARDOWN` call (part of Scheduler API) or an operator can use the 
“/master/teardown” endpoint (part of Operator API).
+
+If the scheduler realizes that its subscription connection to “/scheduler” 
is broken, it should attempt to open a new persistent connection to the
+“/scheduler” (on possibly new master based on the result of master 
detection) and resubscribe. It should not send new non-subscribe HTTP requests 
to “/scheduler” unless it gets a `SUBSCRIBED` event; such requests will 
result in “403 Forbidden”.
+
+If the master does not realize that the subscription connection is broken, but 
the scheduler realizes it, the scheduler might open a new persistent connection 
to
+“/scheduler” via `SUBSCRIBE`. In this case, the semantics depend on the 
value of `subscribe.force`. If set to true, master closes the existing 
subscription connection and allows subscription on the new connection. If set 
to false, the new connection attempt is disallowed in favor of the existing 
connection. The invariant here is that, only one persistent subscription 
connection for a given FrameworkID is allowed on the master. For HA schedulers, 
it is recommended that a scheduler instance set `subscribe.force` to true only 
when it just got elected and set it to false for all subsequent reconnection 
attempts (e.g, due to disconnection or master failover).
+
+### Network partitions
+
+In the case of a network partition, the subscription connection between the 
scheduler and master might not necessarily break. To be able to detect this 
scenario, master periodically (e.g., 15s) sends `HEARTBEAT` events (similar in 
vein to Twitter’s Streaming API). If a scheduler doesn’t receive a bunch 
(e.g., 5) of these heartbeats within a time window, it should immediately 
disconnect and try to re-subscribe. It is highly recommended for schedulers to 
use an exponential backoff strategy (e.g., upto a maximum of 15s) to avoid 
overwhelming the master while reconnecting. Schedulers can use a similar 
timeout (e.g., 75s) for receiving responses to any HTTP requests.
+
+## Master detection
+
+Mesos has a high-availability mode that uses multiple Mesos masters; one 
active master (/documentation/latest/called the leader or leading master) and 
several standbys in case it fails. The masters elect the leader, with ZooKeeper 
coordinating the election. For more details please refer to the 
[documentation](/documentation/latest/high-availability/).
+
+Schedulers are expected to make HTTP requests to the leading master. If 
requests are made to a non-leading master a “HTTP 307 Temporary Redirect” 
will be received with the “Location” header pointing to the leading master.
+
+Example subscription workflow with redirection when the scheduler hits a 
non-leading master.
+
+```
+Scheduler → Master
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost1:5050
+Content-Type: application/json
+Accept: application/json
+Connection: keep-alive
+
+{
+  “framework_info” : {
+    “user” :  “foo”,
+    “name” :  “Example HTTP Framework”
+  },
+  “type”                   : “SUBSCRIBE”
+}
+
+Master → Scheduler
+HTTP/1.1 307 Temporary Redirect
+Location: masterhost2:5050
+
+
+Scheduler → Master
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost2:5050
+Content-Type: application/json
+Accept: application/json
+Connection: keep-alive
+
+{
+  “framework_info” : {
+    “user” :  “foo”,
+    “name” :  “Example HTTP Framework”
+  },
+  “type”                   : “SUBSCRIBE”
+}
+```
+
+If the scheduler knows the list of master’s hostnames for a cluster, it 
could use this mechanism to find the leading master to subscribe with. 
Alternatively, the scheduler could use a pure language library that detects the 
leading master given a ZooKeeper (or etcd) URL. For a `C++` library that does 
ZooKeeper based master detection please look at `src/scheduler/scheduler.cpp`.

Added: mesos/site/source/documentation/latest/ssl.md
URL: 
http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/ssl.md?rev=1707725&view=auto
==============================================================================
--- mesos/site/source/documentation/latest/ssl.md (added)
+++ mesos/site/source/documentation/latest/ssl.md Fri Oct  9 13:33:18 2015
@@ -0,0 +1,92 @@
+---
+layout: documentation
+---
+
+# Configuration
+There is currently only one implementation of the [libprocess socket 
interface](https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/socket.hpp)
 that supports SSL. This implementation uses 
[libevent](https://github.com/libevent/libevent). Specifically it relies on the 
`libevent-openssl` library that wraps `openssl`.
+
+After building Mesos 0.23.0 from source, assuming you have installed the 
required [Dependencies](#Dependencies), you can modify your configure line to 
enable SSL as follows:
+
+~~~
+../configure --enable-libevent --enable-ssl
+~~~
+
+# Running
+Once you have successfully built and installed your new binaries, here are the 
environment variables that are applicable to the `Master`, `Slave`, `Framework 
Scheduler/Executor`, or any `libprocess process`:
+
+#### SSL_ENABLED=(false|0,true|1) [default=false|0]
+Turn on or off SSL. When it is turned off it is the equivalent of default 
mesos with libevent as the backing for events. All sockets default to the 
non-SSL implementation. When it is turned on, the default configuration for 
sockets is SSL. This means outgoing connections will use SSL, and incoming 
connections will be expected to speak SSL as well. None of the below flags are 
relevant if SSL is not enabled.
+
+#### SSL_SUPPORT_DOWNGRADE=(false|0,true|1) [default=false|0]
+Control whether or not non-SSL connections can be established. If this is 
enabled __on the accepting side__, then the accepting side will downgrade to a 
non-SSL socket if the connecting side is attempting to communicate via non-SSL. 
(e.g. HTTP). See [Upgrading Your Cluster](#Upgrading) for more details.
+
+#### SSL_CERT_FILE=(path to certificate)
+The location of the certificate that will be presented.
+
+#### SSL_KEY_FILE=(path to key)
+The location of the private key used by OpenSSL.
+
+#### SSL_VERIFY_CERT=(false|0,true|1) [default=false|0]
+Control whether certificates are verified when presented. If this is false, 
even when a certificate is presented, it will not be verified. When 
`SSL_REQUIRE_CERT` is true, `SSL_VERIFY_CERT` is overridden and all 
certificates will be verified _and_ required.
+
+#### SSL_REQUIRE_CERT=(false|0,true|1) [default=false|0]
+Enforce that certificates must be presented by connecting clients. This means 
all connections (including tools hitting endpoints) must present valid 
certificates in order to establish a connection.
+
+#### SSL_VERIFY_DEPTH=(N) [default=4]
+The maximum depth used to verify certificates. The default is 4. See the 
OpenSSL documentation or contact your system administrator to learn why you may 
want to change this.
+
+#### SSL_CA_DIR=(path to CA directory)
+The directory used to find the certificate authority / authorities. You can 
specify `SSL_CA_DIR` or `SSL_CA_FILE` depending on how you want to restrict 
your certificate authorization.
+
+#### SSL_CA_FILE=(path to CA file)
+The file used to find the certificate authority. You can specify `SSL_CA_DIR` 
or `SSL_CA_FILE` depending on how you want to restrict your certificate 
authorization.
+
+#### SSL_CIPHERS=(accepted ciphers separated by ':') 
[default=AES128-SHA:AES256-SHA:RC4-SHA:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA:DHE-RSA-AES256-SHA:DHE-DSS-AES256-SHA]
+A list of `:`-separated ciphers. Use these if you want to restrict or open up 
the accepted ciphers for OpenSSL. Read the OpenSSL documentation or contact 
your system administrators to see whether you want to override the default 
values.
+
+#### SSL_ENABLE_SSL_V3=(false|0,true|1) [default=false|0]
+#### SSL_ENABLE_TLS_V1_0=(false|0,true|1) [default=false|0]
+#### SSL_ENABLE_TLS_V1_1=(false|0,true|1) [default=false|0]
+#### SSL_ENABLE_TLS_V1_2=(false|0,true|1) [default=true|1]
+The above switches enable / disable the specified protocols. By default only 
TLS V1.2 is enabled. SSL V2 is always disabled; there is no switch to enable 
it. The mentality here is to restrict security by default, and force users to 
open it up explicitly. Many older version of the protocols have known 
vulnerabilities, so only enable these if you fully understand the risks.
+_SSLv2 is disabled completely because modern versions of OpenSSL disable it 
using multiple compile time configuration options._
+#<a name="Dependencies"></a>Dependencies
+
+### libevent
+We require the OpenSSL support from libevent. The suggested version of 
libevent is 
[`2.0.22-stable`](https://github.com/libevent/libevent/releases/tag/release-2.0.22-stable).
 As new releases come out we will try to maintain compatibility.
+
+~~~
+// For example, on OSX:
+brew install libevent
+~~~
+
+### OpenSSL
+We require [OpenSSL](https://github.com/openssl/openssl). There are multiple 
branches of OpenSSL that are being maintained by the community. Since security 
requires being vigilant, we recommend reading the release notes for the current 
releases of OpenSSL and deciding on a version within your organization based on 
your security needs. Mesos is not too deeply dependent on specific OpenSSL 
versions, so there is room for you to make security decisions as an 
organization.
+Please ensure the `event2` and `openssl` headers are available for building 
mesos.
+
+~~~
+// For example, on OSX:
+brew install openssl
+~~~
+
+# <a name="Upgrading"></a>Upgrading Your Cluster
+_There is no SSL specific requirement for upgrading different components in a 
specific order._
+
+The recommended strategy is to restart all your components to enable SSL with 
downgrades support enabled. Once all components have SSL enabled, then do a 
second restart of all your components to disable downgrades. This strategy will 
allow each component to be restarted independently at your own convenience with 
no time restrictions. It will also allow you to try SSL in a subset of your 
cluster. __NOTE:__ While different components in your cluster are serving SSL 
vs non-SSL traffic, any relative links in the WebUI may be broken. Please see 
the [WebUI](#WebUI) section for details. Here are sample commands for upgrading 
your cluster:
+
+~~~
+// Restart each component with downgrade support (master, slave, framework):
+SSL_ENABLED=true SSL_SUPPORT_DOWNGRADE=true 
SSL_KEY_FILE=<path-to-your-private-key> 
SSL_CERT_FILE=<path-to-your-certificate> <Any other SSL_* environment variables 
you may choose> <your-component (e.g. bin/master.sh)> <your-flags>
+
+// Restart each component WITHOUT downgrade support (master, slave, framework):
+SSL_ENABLED=true SSL_SUPPORT_DOWNGRADE=false 
SSL_KEY_FILE=<path-to-your-private-key> 
SSL_CERT_FILE=<path-to-your-certificate> <Any other SSL_* environment variables 
you may choose> <your-component (e.g. bin/master.sh)> <your-flags>
+~~~
+The end state is a cluster that is only communicating with SSL.
+
+__NOTE:__ Any tools you may use that communicate with your components must be 
able to speak SSL, or they will be denied. You may choose to maintain 
`SSL_SUPPORT_DOWNGRADE=true` for some time as you upgrade your internal 
tooling. The advantage of `SSL_SUPPORT_DOWNGRADE=true` is that all components 
that speak SSL will do so, while other components may still communicate over 
insecure channels.
+
+# <a name="WebUI"></a>WebUI
+The default Mesos WebUI uses relative links. Some of these links transition 
between endpoints served by the master and slaves. The WebUI currently does not 
have enough information to change the 'http' vs 'https' links based on whether 
the target endpoint is currently being served by an SSL-enabled binary. This 
may cause certain links in the WebUI to be broken when a cluster is in a 
transition state between SSL and non-SSL. Any tools that hit these endpoints 
will still be able to access them as long as they hit the endpoint using the 
right protocol, or the `SSL_SUPPORT_DOWNGRADE` option is set to true.
+
+### Certificates
+Most browsers have built in protection that guard transitions between pages 
served using different certificates. For this reason you may choose to serve 
both the master and slave endpoints using a common certificate that covers 
multiple hostnames. If you do not do this, certain links, such as those to 
slave sandboxes, may seem broken as the browser treats the transition between 
differing certificates transition as unsafe.

Modified: mesos/site/source/documentation/latest/submitting-a-patch.md
URL: 
http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/submitting-a-patch.md?rev=1707725&r1=1707724&r2=1707725&view=diff
==============================================================================
--- mesos/site/source/documentation/latest/submitting-a-patch.md (original)
+++ mesos/site/source/documentation/latest/submitting-a-patch.md Fri Oct  9 
13:33:18 2015
@@ -30,10 +30,10 @@ layout: documentation
 
 ### Create your patch
 1. Create one or more test cases to exercise the bug or the feature (the Mesos 
team uses [test-driven 
development](http://en.wikipedia.org/wiki/Test-driven_development)). Before you 
start coding, make sure these test cases all fail.
-    1. The [testing patterns](/documentation/latest/mesos-testing-patterns/) 
page has some suggestions for writing test cases.
+    1. The [testing patterns](/documentation/latest/testing-patterns/) page 
has some suggestions for writing test cases.
 
 2. Make your changes to the code (using whatever IDE/editor you choose) to 
actually fix the bug or implement the feature.
-    1. Before beginning, please read the [Mesos C++ Style 
Guide](/documentation/latest/mesos-c++-style-guide/). It is recommended to use 
the git pre-commit hook (`support/hooks/pre-commit`) to automatically check for 
style errors. See the hook script for instructions to enable it.
+    1. Before beginning, please read the [Mesos C++ Style 
Guide](/documentation/latest/c++-style-guide/). It is recommended to use the 
git pre-commit hook (`support/hooks/pre-commit`) to automatically check for 
style errors. See the hook script for instructions to enable it.
     2. Most of your changes will probably be to files inside of 
`BASE_MESOS_DIR`
     3. From inside of the root Mesos directory: `./bootstrap` (Only required 
if building from git repository).
     4. To build, we recommend that you don't build inside of the src 
directory. We recommend you do the following:
@@ -77,4 +77,4 @@ layout: documentation
 4. The last step is to ensure that the necessary documentation gets created or 
updated so the whole world knows about your new feature or bug fix.
 
 ## Style Guides
-* For patches to the core, we ask that you follow the [Mesos C++ Style 
Guide](/documentation/latest/mesos-c++-style-guide/).
+* For patches to the core, we ask that you follow the [Mesos C++ Style 
Guide](/documentation/latest/c++-style-guide/).

Added: mesos/site/source/documentation/latest/testing-patterns.md
URL: 
http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/testing-patterns.md?rev=1707725&view=auto
==============================================================================
--- mesos/site/source/documentation/latest/testing-patterns.md (added)
+++ mesos/site/source/documentation/latest/testing-patterns.md Fri Oct  9 
13:33:18 2015
@@ -0,0 +1,81 @@
+---
+layout: documentation
+---
+
+# Mesos Testing Patterns
+
+A collection of common testing patterns used in Mesos tests. If you have found 
a good way to test a certain condition that you think may be useful for other 
cases, please document it here together with motivation and background.
+
+## Using `Clock` magic to ensure an event is processed
+Scheduling a sequence of events in an asynchronous environment is not easy: a 
function call usually initiates an action and returns immediately, while the 
action runs in background. A simple, obvious, and bad solution is to use 
`os::sleep()` to wait for action completion. The time the action needs to 
finish may vary on different machines, while increasing sleep duration 
increases the test execution time and slows down `make check`. One of the right 
ways to do it is to wait for an action to finish and proceed right after. This 
is possible using libprocess' `Clock` routines.
+
+
+Every message enqueued in a libprocess process' (or actor's, to avoid 
ambiguity with OS processes) mailbox is processed by `ProcessManager` (right 
now there is a single instance of `ProcessManager` per OS process, but this may 
change in the future). `ProcessManager` fetches actors from the runnable actors 
list and services all events from the actor's mailbox. Using `Clock::settle()` 
call we can block the calling thread until `ProcessManager` empties mailboxes 
of all actors. Here is the example of this pattern:
+
+~~~{.cpp}
+// As Master::killTask isn't doing anything, we shouldn't get a status update.
+EXPECT_CALL(sched, statusUpdate(&driver, _))
+  .Times(0);
+
+// Set expectation that Master receives killTask message.
+Future<KillTaskMessage> killTaskMessage =
+  FUTURE_PROTOBUF(KillTaskMessage(), _, master.get());
+
+// Attempt to kill unknown task while slave is transitioning.
+TaskID unknownTaskId;
+unknownTaskId.set_value("2");
+
+// Stop the clock.
+Clock::pause();
+
+// Initiate an action.
+driver.killTask(unknownTaskId);
+
+// Make sure the event associated with the action has been queued.
+AWAIT_READY(killTaskMessage);
+
+// Wait for all messages to be dispatched and processed completely to satisfy
+// the expectation that we didn't receive a status update.
+Clock::settle();
+
+Clock::resume();
+~~~
+
+## Intercepting a message sent to a different OS process
+Intercepting messages sent between libprocess processes (let's call them 
actors to avoid ambiguity with OS processes) that live in the same OS process 
is easy, e.g.:
+
+~~~{.cpp}
+Future<SlaveReregisteredMessage> slaveReregisteredMessage =
+  FUTURE_PROTOBUF(SlaveReregisteredMessage(), _, _);
+...
+AWAIT_READY(slaveReregisteredMessage);
+~~~
+
+However, this won't work if we want to intercept a message sent to an actor 
(technically a `UPID`) that lives in another OS process. For example, 
`CommandExecutor` spawned by a slave will live in a separate OS process, though 
master and slave instances live in the same OS process together with our test 
(see `mesos/src/tests/cluster.hpp`). The wait in this code will fail:
+
+~~~{.cpp}
+Future<ExecutorRegisteredMessage> executorRegisteredMessage =
+  FUTURE_PROTOBUF(ExecutorRegisteredMessage(), _, _);
+...
+AWAIT_READY(executorRegisteredMessage);
+~~~
+
+### Why messages sent outside the OS process are not intercepted?
+Libprocess events may be filtered (see 
`libprocess/include/process/filter.hpp`). `FUTURE_PROTOBUF` uses this ability 
and sets an expectation on a `filter` method of `TestsFilter` class with a 
`MessageMatcher`, that matches the message we want to intercept. The actual 
filtering happens in `ProcessManager::resume()`, which fetches messages from 
the queue of the received events.
+
+*No* filtering happens when sending, encoding, or transporting the message 
(see e.g. `ProcessManager::deliver()` or `SocketManager::send()`). Therefore in 
the aforementioned example, `ExecutorRegisteredMessage` leaves the slave 
undetected by the filter, reaches another OS process where executor lives, gets 
enqueued into the `CommandExecutorProcess`' mailbox and can be filtered there, 
but remember our expectation is set in another OS process!
+
+### How to workaround
+Consider setting expectations on corresponding incoming messages ensuring they 
are processed and therefore ACK message is sent.
+
+For the aforementioned example, instead of intercepting 
`ExecutorRegisteredMessage`, we can intercept `RegisterExecutorMessage` and 
wait until its processed, which includes sending `ExecutorRegisteredMessage` 
(see `Slave::registerExecutor()`):
+
+~~~{.cpp}
+Future<RegisterExecutorMessage> registerExecutorMessage =
+  FUTURE_PROTOBUF(RegisterExecutorMessage(), _, _);
+...
+AWAIT_READY(registerExecutorMessage);
+Clock::pause();
+Clock::settle();
+Clock::resume();
+~~~

Modified: mesos/site/source/documentation/latest/tools.md
URL: 
http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/tools.md?rev=1707725&r1=1707724&r2=1707725&view=diff
==============================================================================
--- mesos/site/source/documentation/latest/tools.md (original)
+++ mesos/site/source/documentation/latest/tools.md Fri Oct  9 13:33:18 2015
@@ -20,10 +20,10 @@ These tools make it easy to set up and r
 
 If you want to hack on Mesos or write a new framework, these tools will help.
 
-* [clang-format](/documentation/latest/clang-format/) to automatically apply 
some of the style rules dictated by the [Mesos C++ Style 
Guide](/documentation/latest/mesos-c++-style-guide/).
+* [clang-format](/documentation/latest/clang-format/) to automatically apply 
some of the style rules dictated by the [Mesos C++ Style 
Guide](/documentation/latest/c++-style-guide/).
 * [Go Bindings and Examples](https://github.com/mesosphere/mesos-go) Write a 
Mesos framework in Go! Comes with an example scheduler and executor.
 * [Mesos Framework giter8 
Template](https://github.com/mesosphere/scala-sbt-mesos-framework.g8) This is a 
giter8 template. The result of applying this template is a bare-bones Apache 
Mesos framework in Scala using SBT for builds and Vagrant for testing on a 
singleton cluster.
 * [Scala Hello World](https://gist.github.com/guenter/7471695) A simple Mesos 
"Hello World": downloads and starts a web server on every node in the cluster.
 * [Xcode Workspace](https://github.com/tillt/xcode-mesos) Hack on Mesos in 
Xcode.
 
-Can't find yours in the list? Please submit a patch, or email 
[email protected] and we'll add you!
\ No newline at end of file
+Can't find yours in the list? Please submit a patch, or email 
[email protected] and we'll add you!



Reply via email to