Modified: mesos/site/publish/documentation/network-monitoring/index.html
URL: 
http://svn.apache.org/viewvc/mesos/site/publish/documentation/network-monitoring/index.html?rev=1690830&r1=1690829&r2=1690830&view=diff
==============================================================================
--- mesos/site/publish/documentation/network-monitoring/index.html (original)
+++ mesos/site/publish/documentation/network-monitoring/index.html Mon Jul 13 
22:09:25 2015
@@ -81,19 +81,28 @@
                <p>See our <a href="/community/">community</a> page for more 
details.</p>
        </div>
        <div class="col-md-8">
-               <h1>Network Monitoring</h1>
+               <h1>Per-container Network Monitoring and Isolation</h1>
 
-<p>Mesos 0.20.0 adds the support for per container network monitoring. Network 
statistics for each active container can be retrieved through the 
<code>/monitor/statistics.json</code> endpoint on the slave.</p>
-
-<p>The current solution is completely transparent to the tasks running on the 
slave. In other words, tasks will not notice any difference as if they were 
running on a slave without network monitoring turned on and were sharing the 
network of the slave.</p>
-
-<h2>How to setup?</h2>
-
-<p>To turn on network monitoring on your mesos cluster, you need to follow the 
following procedures.</p>
+<p>Mesos on Linux provides support for per-container network monitoring and
+isolation. The network isolation prevents a single container from exhausting 
the
+available network ports, consuming an unfair share of the network bandwidth or
+significantly delaying packet transmission for others. Network statistics for
+each active container are published through the 
<code>/monitor/statistics.json</code>
+endpoint on the slave. The network isolation is transparent for the majority of
+tasks running on a slave (those that bind to port 0 and let the kernel allocate
+their port).</p>
+
+<h2>Installation</h2>
+
+<p>Per-container network monitoring and isolation is <strong>not</strong> 
supported by default.
+To enable it you need to install additional dependencies and configure it 
during
+the build process.</p>
 
 <h3>Prerequisites</h3>
 
-<p>Currently, network monitoring is only supported on Linux. Make sure your 
kernel is at least 3.6. Also, check your kernel to make sure that the following 
upstream patches are merged in (Mesos will automatically check for those kernel 
functionalities and will abort if they are not supported):</p>
+<p>Per-container network monitoring and isolation is only supported on Linux 
kernel
+versions 3.6 and above. Additionally, the kernel must include these patches
+(merged in kernel version 3.15).</p>
 
 <ul>
 <li><a 
href="https://github.com/torvalds/linux/commit/6a662719c9868b3d6c7d26b3a085f0cd3cc15e64";>6a662719c9868b3d6c7d26b3a085f0cd3cc15e64</a></li>
@@ -103,124 +112,354 @@
 </ul>
 
 
-<p>Make sure the following packages are installed on the slave:</p>
+<p>The following packages are required on the slave:</p>
 
 <ul>
 <li><a href="http://www.infradead.org/~tgr/libnl/";>libnl3</a> >= 3.2.26</li>
-<li><a 
href="http://www.linuxfoundation.org/collaborate/workgroups/networking/iproute2";>iproute</a>
 (>= 2.6.39 is advised but not required for debugging purpose)</li>
+<li><a 
href="http://www.linuxfoundation.org/collaborate/workgroups/networking/iproute2";>iproute</a>
 >= 2.6.39 is advised for debugging purpose but not required.</li>
 </ul>
 
 
-<p>On the build machine, you need to install the following packages:</p>
+<p>Additionally, if you are building from source, you need will also need the
+libnl3 development package to compile Mesos:</p>
 
 <ul>
-<li><a href="http://www.infradead.org/~tgr/libnl/";>libnl3-devel</a> >= 
3.2.26</li>
+<li><a href="http://www.infradead.org/~tgr/libnl/";>libnl3-devel / 
libnl3-dev</a> >= 3.2.26</li>
 </ul>
 
 
-<h3>Configure and build</h3>
+<h3>Build</h3>
 
-<p>Network monitoring will NOT be built in by default. To build Mesos with 
network monitoring support, you need to add a configure option:</p>
+<p>To build Mesos with per-container network monitoring and isolation support, 
you
+need to add a configure option:</p>
 
 <pre><code>$ ./configure --with-network-isolator
 $ make
 </code></pre>
 
-<h3>Host ephemeral ports squeeze</h3>
+<h2>Configuration</h2>
+
+<p>Per-container network monitoring and isolation is enabled on the slave by 
adding
+<code>network/port_mapping</code> to the slave command line 
<code>--isolation</code> flag.</p>
+
+<pre><code>--isolation="network/port_mapping"
+</code></pre>
+
+<p>If the slave has not been compiled with per-container network monitoring and
+isolation support, it will refuse to start and print an error:</p>
+
+<pre><code>I0708 00:17:08.080271 44267 containerizer.cpp:111] Using isolation: 
network/port_mapping
+Failed to create a containerizer: Could not create MesosContainerizer: Unknown 
or unsupported
+    isolator: network/port_mapping
+</code></pre>
+
+<h2>Configuring network ports</h2>
+
+<p>Without network isolation, all the containers on a host share the public IP
+address of the slave and can bind to any port allowed by the OS.</p>
 
-<p>With network monitoring being turned on, each container on the slave will 
have a separate network stack (via Linux <a 
href="http://lwn.net/Articles/580893/";>network namespaces</a>). All containers 
share the same public IP of the slave (so that service discovery mechanism does 
not need to be changed). Each container will be assigned a subset of the ports 
from the host, and is only allowed to use those ports to make connections with 
other hosts.</p>
+<p>When network isolation is enabled, each container on the slave has a 
separate
+network stack (via Linux <a href="http://lwn.net/Articles/580893/";>network 
namespaces</a>).
+All containers still share the same public IP of the slave (so that the service
+discovery mechanism does not need to be changed). The slave assigns each
+container a non-overlapping range of the ports and only packets to/from these
+assigned port ranges will be delivered. Applications requesting the kernel
+assign a port (by binding to port 0) will be given ports from the container
+assigned range. Applications can bind to ports outside the container assigned
+ranges but packets from to/from these ports will be silently dropped by the
+host.</p>
 
-<p>For non-ephemeral ports (e.g, listening ports), Mesos already exposes that 
to the scheduler (resource: &lsquo;ports&rsquo;). The scheduler is responsible 
for allocating those ports to executors/tasks.</p>
+<p>Mesos provides two ranges of ports to containers:</p>
+
+<ul>
+<li><p>OS allocated &ldquo;<a 
href="https://en.wikipedia.org/wiki/Ephemeral_port";>ephemeral</a>&rdquo; ports
+are assigned by the OS in a range specified for each container by 
Mesos.</p></li>
+<li><p>Mesos allocated &ldquo;non-ephemeral&rdquo; ports are acquired by a 
framework using the
+same Mesos resource offer mechanism used for cpu, memory etc. for allocation to
+executors/tasks as required.</p></li>
+</ul>
 
-<p>For ephemeral ports, without network monitoring, all executors/tasks 
running on the slave share the same ephemeral port range of the host. The 
default ephemeral port range on most Linux distributions is [32768, 61000]. 
With network monitoring, for each container, we need to reserve a range for 
ports on the host which will be used as the ephemeral port range for the 
container network stack (these ports are directly mapped into the container). 
We need to ensure none of the host processes are using those ports. Because of 
that, you may want to squeeze the host ephemeral port range in order to support 
more containers on each slave. To do that, you can use the following command 
(need root permission). A host reboot is required to ensure there are no 
connections using ports outside the new ephemeral range.</p>
 
-<pre><code># This sets the host ephemeral port range to [57345, 61000].
-$ echo "57345 61000" &gt; /proc/sys/net/ipv4/ip_local_port_range
+<p>Additionally, the host itself will require ephemeral ports for network
+communication. You need to configure these three 
<strong>non-overlapping</strong> port ranges
+on the host.</p>
+
+<h3>Host ephemeral port range</h3>
+
+<p>The currently configured host ephemeral port range can be discovered at any 
time
+using the command <code>sysctl net.ipv4.ip_local_port_range</code>. If ports 
need to be set
+aside for slave containers, the ephemeral port range can be updated in
+<code>/etc/sysctl.conf</code>. Rebooting after the update will apply the 
change and
+eliminate the possibility that ports are already in use by other processes. For
+example, by adding the following:</p>
+
+<pre><code># net.ipv4.ip_local_port_range defines the host ephemeral port 
range, by
+# default 32768-61000.  We reduce this range to allow the Mesos slave to
+# allocate ports 32768-57344
+# net.ipv4.ip_local_port_range = 32768 61000
+net.ipv4.ip_local_port_range = 57345 61000
 </code></pre>
 
-<h3>Turn on network monitoring</h3>
+<h3>Container port ranges</h3>
+
+<p>The container ephemeral and non-ephemeral port ranges are configured using 
the
+slave <code>--resources</code> flag. The non-ephemeral port range is provided 
to the
+master, which will then offer it to frameworks for allocation.</p>
+
+<p>The ephemeral port range is sub-divided by the slave, giving
+<code>ephemeral_ports_per_container</code> (default 1024) to each container. 
The maximum
+number of containers on the slave will therefore be limited to 
approximately:</p>
+
+<pre><code>number of ephemeral_ports / ephemeral_ports_per_container
+</code></pre>
+
+<p>The master <code>--max_executors_per_slave</code> flag is be used to 
prevent allocation of
+more executors on a slave when the ephemeral port range has been exhausted.</p>
+
+<p>It is recommended (but not required) that 
<code>ephemeral_ports_per_container</code> be set
+to a power of 2 (e.g., 512, 1024) and the lower bound of the ephemeral port
+range be a multiple of <code>ephemeral_ports_per_container</code> to minimize 
CPU overhead
+in packet processing. For example:</p>
+
+<pre><code>--resources=ports:[31000-32000];ephemeral_ports:[32768-57344] \
+--ephemeral_ports_per_container=512
+</code></pre>
 
-<p>After the host ephemeral ports squeeze and reboot, you can turn on network 
monitoring by appending <code>network/port_mapping</code> to the isolation 
flag. Notice that you need specify the <code>ephemeral_ports</code> resource 
(via &ndash;resources flag). It tells the slave which ports on the host are 
reserved for containers. It must NOT overlap with the host ephemeral port 
range. You can also specify how many ephemeral ports you want to allocate to 
each container. It is recommended but not required that this number is power of 
2 aligned (e.g., 512, 1024). If not, there will be some performance impact for 
classifying packets. The maximum number of containers on the slave will be 
limited by approximately |ephemeral_ports|/ephemeral_ports_per_container, 
subject to alignment etc.</p>
+<h3>Rate limiting container traffic</h3>
+
+<p>Outbound traffic from a container to the network can be rate limited to 
prevent
+a single container from consuming all available network resources with
+detrimental effects to the other containers on the host. The
+<code>--egress_rate_limit_per_container</code> flag specifies that each 
container launched
+on the host be limited to the specified bandwidth (in bytes per second).
+Network traffic which would cause this limit to be exceeded is delayed for 
later
+transmission. The TCP protocol will adjust to the increased latency and reduce
+the transmission rate ensuring no packets need be dropped.</p>
+
+<pre><code>--egress_rate_limit_per_container=100MB
+</code></pre>
+
+<p>We do not rate limit inbound traffic since we can only modify the network 
flows
+after they have been received by the host and any congestion has already
+occurred.</p>
+
+<h3>Egress traffic isolation</h3>
+
+<p>Delaying network data for later transmission can increase latency and jitter
+(variability) for all traffic on the interface. Mesos can reduce the impact on
+other containers on the same host by using flow classification and isolation
+using the containers port ranges to maintain unique flows for each container 
and
+sending traffic from these flows fairly (using the
+<a 
href="https://tools.ietf.org/html/draft-hoeiland-joergensen-aqm-fq-codel-00";>FQ_Codel</a>
+algorithm). Use the <code>--egress_unique_flow_per_container</code> flag to 
enable.</p>
+
+<pre><code>--egress_unique_flow_per_container
+</code></pre>
+
+<h3>Putting it all together</h3>
+
+<p>A complete slave command line enabling network isolation, reserving ports
+57345-61000 for host ephemeral ports, 32768-57344 for container ephemeral 
ports,
+31000-32000 for non-ephemeral ports allocated by the framework, limiting
+container transmit bandwidth to 300 Mbits/second (37.5MBytes) with unique flows
+enabled would thus be:</p>
 
 <pre><code>mesos-slave \
-    --checkpoint \
-    --log_dir=/var/log/mesos \
-    --work_dir=/var/lib/mesos \
-    --isolation=cgroups/cpu,cgroups/mem,network/port_mapping \
-    
--resources=cpus:22;mem:62189;ports:[31000-32000];disk:400000;ephemeral_ports:[32768-57344]
 \
-    --ephemeral_ports_per_container=1024
+--isolation=network/port_mapping \
+--resources=ports:[31000-32000];ephemeral_ports:[32768-57344] \
+--ephemeral_ports_per_container=1024 \
+--egress_rate_limit_per_container=37500KB \
+--egress_unique_flow_per_container
 </code></pre>
 
-<h2>How to get statistics?</h2>
+<h2>Monitoring container network statistics</h2>
 
-<p>Currently, we report the following network statistics:</p>
+<p>Mesos exposes statistics from the Linux network stack for each container 
network
+on the <code>/monitor/statistics.json</code> slave endpoint.</p>
 
-<ul>
-<li><em>net_rx_bytes</em></li>
-<li><em>net_rx_dropped</em></li>
-<li><em>net_rx_errors</em></li>
-<li><em>net_rx_packets</em></li>
-<li><em>net_tx_bytes</em></li>
-<li><em>net_tx_dropped</em></li>
-<li><em>net_tx_errors</em></li>
-<li><em>net_tx_packets</em></li>
-</ul>
+<p>From the network interface inside the container, we report the following
+counters (since container creation) under the <code>statistics</code> key:</p>
+
+<table>
+<thead>
+<tr><th>Metric</th><th>Description</th><th>Type</th>
+</thead>
+<tr>
+  <td><code>net_rx_bytes</code></td>
+  <td>Received bytes</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td><code>net_rx_dropped</code></td>
+  <td>Packets dropped on receive</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td><code>net_rx_errors</code></td>
+  <td>Errors reported on receive</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td><code>net_rx_packets</code></td>
+  <td>Packets received</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td><code>net_tx_bytes</code></td>
+  <td>Sent bytes</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td><code>net_tx_dropped</code></td>
+  <td>Packets dropped on send</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td><code>net_tx_errors</code></td>
+  <td>Errors reported on send</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td><code>net_tx_packets</code></td>
+  <td>Packets sent</td>
+  <td>Counter</td>
+</tr>
+</table>
+
+
+<p>Additionally, <a 
href="http://tldp.org/HOWTO/Traffic-Control-HOWTO/intro.html";>Linux Traffic 
Control</a> can report the following
+statistics for the elements which implement bandwidth limiting and bloat
+reduction under the <code>statistics/net_traffic_control_statistics</code> 
key. The entry
+for each of these elements includes:</p>
+
+<table>
+<thead>
+<tr><th>Metric</th><th>Description</th><th>Type</th>
+</thead>
+<tr>
+  <td><code>backlog</code></td>
+  <td>Bytes queued for transmission [1]</td>
+  <td>Gauge</td>
+</tr>
+<tr>
+  <td><code>bytes</code></td>
+  <td>Sent bytes</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td><code>drops</code></td>
+  <td>Packets dropped on send</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td><code>overlimits</code></td>
+  <td>Count of times the interface was over its transmit limit when it 
attempted to send a packet.  Since the normal action when the network is 
overlimit is to delay the packet, the overlimit counter can be incremented many 
times for each packet sent on a heavily congested interface. [2]</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td><code>packets</code></td>
+  <td>Packets sent</td>
+  <td>Counter</td>
+</tr>
+<tr>
+  <td><code>qlen</code></td>
+  <td>Packets queued for transmission</td>
+  <td>Gauge</td>
+</tr>
+<tr>
+  <td><code>ratebps</code></td>
+  <td>Transmit rate in bytes/second [3]</td>
+  <td>Gauge</td>
+</tr>
+<tr>
+  <td><code>ratepps</code></td>
+  <td>Transmit rate in packets/second [3]</td>
+  <td>Gauge</td>
+</tr>
+<tr>
+  <td><code>requeues</code></td>
+  <td>Packets failed to send due to resource contention (such as kernel 
locking) [3]</td>
+  <td>Counter</td>
+</tr>
+</table>
 
 
+<p>[1] Backlog is only reported on the bloat_reduction interface</p>
+
+<p>[2] Overlimits are only reported on the bw_limit interface</p>
+
+<p>[3] Currently always reported as 0 by the underlying Traffic Control 
element.</p>
+
 <p>For example, these are the statistics you will get by hitting the 
<code>/monitor/statistics.json</code> endpoint on a slave with network 
monitoring turned on:</p>
 
-<pre><code>$ curl -s http://localhost:5051/monitor/statistics.json | python2.6
--mjson.tool
+<pre><code>$ curl -s http://localhost:5051/monitor/statistics.json | python2.6 
-mjson.tool
 [
     {
-        "executor_id": 
"sample_executor_id-ebd8fa62-757d-489e-9e23-678a21d078d6",
-        "executor_name": "sample_executor",
-        "framework_id": "201103282247-0000000019-0000",
-        "source": "sample_executor",
+        "executor_id": "job.1436298853",
+        "executor_name": "Command Executor (Task: job.1436298853) (Command: sh 
-c 'iperf ....')",
+        "framework_id": "20150707-195256-1740121354-5150-29801-0000",
+        "source": "job.1436298853",
         "statistics": {
-            "cpus_limit": 0.35,
-            "cpus_nr_periods": 520883,
-            "cpus_nr_throttled": 2163,
-            "cpus_system_time_secs": 154.42,
-            "cpus_throttled_time_secs": 145.96,
-            "cpus_user_time_secs": 258.74,
-            "mem_anon_bytes": 109137920,
-            "mem_file_bytes": 30613504,
+            "cpus_limit": 1.1,
+            "cpus_nr_periods": 16314,
+            "cpus_nr_throttled": 16313,
+            "cpus_system_time_secs": 2667.06,
+            "cpus_throttled_time_secs": 8036.840845388,
+            "cpus_user_time_secs": 123.49,
+            "mem_anon_bytes": 8388608,
+            "mem_cache_bytes": 16384,
+            "mem_critical_pressure_counter": 0,
+            "mem_file_bytes": 16384,
             "mem_limit_bytes": 167772160,
-            "mem_mapped_file_bytes": 8192,
-            "mem_rss_bytes": 140341248,
-            "net_rx_bytes": 2402099,
+            "mem_low_pressure_counter": 0,
+            "mem_mapped_file_bytes": 0,
+            "mem_medium_pressure_counter": 0,
+            "mem_rss_bytes": 8388608,
+            "mem_total_bytes": 9945088,
+            "net_rx_bytes": 10847,
             "net_rx_dropped": 0,
             "net_rx_errors": 0,
-            "net_rx_packets": 33273,
-            "net_tx_bytes": 1507798,
+            "net_rx_packets": 143,
+            "net_traffic_control_statistics": [
+                {
+                    "backlog": 0,
+                    "bytes": 163206809152,
+                    "drops": 77147,
+                    "id": "bw_limit",
+                    "overlimits": 210693719,
+                    "packets": 107941027,
+                    "qlen": 10236,
+                    "ratebps": 0,
+                    "ratepps": 0,
+                    "requeues": 0
+                },
+                {
+                    "backlog": 15481368,
+                    "bytes": 163206874168,
+                    "drops": 27081494,
+                    "id": "bloat_reduction",
+                    "overlimits": 0,
+                    "packets": 107941070,
+                    "qlen": 10239,
+                    "ratebps": 0,
+                    "ratepps": 0,
+                    "requeues": 0
+                }
+            ],
+            "net_tx_bytes": 163200529816,
             "net_tx_dropped": 0,
             "net_tx_errors": 0,
-            "net_tx_packets": 17726,
-            "timestamp": 1408043826.91626
+            "net_tx_packets": 107936874,
+            "perf": {
+                "duration": 0,
+                "timestamp": 1436298855.82807
+            },
+            "timestamp": 1436300487.41595
         }
     }
 ]
 </code></pre>
 
-<h1>Network Egress Rate Limit</h1>
-
-<p>Mesos 0.21.0 adds an optional feature to limit the egress network bandwidth 
for each container. With this feature enabled, each container&rsquo;s egress 
traffic is limited to the specified rate. This can prevent a single container 
from dominating the entire network.</p>
-
-<h2>How to enable it?</h2>
-
-<p>Egress Rate Limit requires Network Monitoring. To enable it, please follow 
all the steps in the <a href="#Network_Monitoring">previous section</a> to 
enable the Network Monitoring first, and then use the newly introduced 
<code>egress_rate_limit_per_container</code> flag to specify the rate limit for 
each container. Note that this flag expects a <code>Bytes</code> type like the 
following:</p>
-
-<pre><code>mesos-slave \
-    --checkpoint \
-    --log_dir=/var/log/mesos \
-    --work_dir=/var/lib/mesos \
-    --isolation=cgroups/cpu,cgroups/mem,network/port_mapping \
-    
--resources=cpus:22;mem:62189;ports:[31000-32000];disk:400000;ephemeral_ports:[32768-57344]
 \
-    --ephemeral_ports_per_container=1024 \
-    --egress_rate_limit_per_container=37500KB # Convert to ~300Mbits/s.
-</code></pre>
-
        </div>
 </div>
 

Modified: mesos/site/publish/documentation/slave-recovery/index.html
URL: 
http://svn.apache.org/viewvc/mesos/site/publish/documentation/slave-recovery/index.html?rev=1690830&r1=1690829&r2=1690830&view=diff
==============================================================================
--- mesos/site/publish/documentation/slave-recovery/index.html (original)
+++ mesos/site/publish/documentation/slave-recovery/index.html Mon Jul 13 
22:09:25 2015
@@ -164,6 +164,19 @@ Therefore, it is highly recommended to a
 <blockquote><p>NOTE: Frameworks that have enabled checkpointing will only get 
offers from checkpointing slaves. So, before setting 
<code>checkpoint=True</code> on FrameworkInfo, ensure that there are slaves in 
your cluster that have enabled checkpointing.
 Because, if there are no checkpointing slaves, the framework would not get any 
offers and hence cannot launch any tasks/executors!</p></blockquote>
 
+<h2>Known issues with <code>systemd</code> and POSIX isolation</h2>
+
+<p>There is a known issue when using <code>systemd</code> to launch the 
<code>mesos-slave</code> while also using only <code>posix</code> isolation 
mechanisms that prevents tasks from recovering. The problem is that the default 
<a 
href="http://www.freedesktop.org/software/systemd/man/systemd.kill.html";>KillMode</a>
 for systemd processes is <code>cgroup</code> and hence all child processes are 
killed when the slave stops. Explicitly setting <code>KillMode</code> to 
<code>process</code> allows the executors to survive and reconnect.</p>
+
+<p>The following excerpt of a <code>systemd</code> unit configuration file 
shows how to set the flag:</p>
+
+<pre><code>[Service]
+ExecStart=/usr/bin/mesos-slave
+KillMode=process
+</code></pre>
+
+<blockquote><p>NOTE: There are also known issues with using 
<code>systemd</code> and raw <code>cgroups</code> based isolation, for now the 
suggested non-Posix isolation mechanism is to use Docker 
containerization.</p></blockquote>
+
 <h2>Upgrading to 0.14.0</h2>
 
 <p>If you want to upgrade a running Mesos cluster to 0.14.0 to take advantage 
of slave recovery please follow the <a 
href="/documentation/latest/upgrades/">upgrade instructions</a>.</p>

Modified: mesos/site/publish/documentation/tools/index.html
URL: 
http://svn.apache.org/viewvc/mesos/site/publish/documentation/tools/index.html?rev=1690830&r1=1690829&r2=1690830&view=diff
==============================================================================
--- mesos/site/publish/documentation/tools/index.html (original)
+++ mesos/site/publish/documentation/tools/index.html Mon Jul 13 22:09:25 2015
@@ -90,7 +90,6 @@
 <ul>
 <li><a href="https://github.com/rayrod2030/collectd-mesos";>collectd plugin</a> 
to collect Mesos cluster metrics.</li>
 <li><a href="/documentation/latest/deploy-scripts/">Deploy scripts</a> for 
launching a Mesos cluster on a set of machines.</li>
-<li><a href="/documentation/latest/ec2-scripts/">EC2 scripts</a> for launching 
a Mesos cluster on Amazon EC2.</li>
 <li><a href="https://github.com/everpeace/cookbook-mesos";>Chef cookbook by 
Everpeace</a> Install Mesos and configure master and slave. This cookbook 
supports installation from source or the Mesosphere packages.</li>
 <li><a href="https://github.com/mdsol/mesos_cookbook";>Chef cookbook by 
Mdsol</a> Application cookbook for installing the Apache Mesos cluster manager. 
This cookbook installs Mesos via packages provided by Mesosphere.</li>
 <li><a href="https://github.com/deric/puppet-mesos";>Puppet Module by Deric</a> 
This is a Puppet module for managing Mesos nodes in a cluster.</li>

Modified: mesos/site/publish/documentation/upgrades/index.html
URL: 
http://svn.apache.org/viewvc/mesos/site/publish/documentation/upgrades/index.html?rev=1690830&r1=1690829&r2=1690830&view=diff
==============================================================================
--- mesos/site/publish/documentation/upgrades/index.html (original)
+++ mesos/site/publish/documentation/upgrades/index.html Mon Jul 13 22:09:25 
2015
@@ -95,6 +95,18 @@
 
 <p><strong>NOTE</strong> The Resource protobuf has been extended to include 
more metadata for supporting persistence (DiskInfo), dynamic reservations 
(ReservationInfo) and oversubscription (RevocableInfo). You must not combine 
two Resource objects if they have different metadata.</p>
 
+<p>In order to upgrade a running cluster:</p>
+
+<ul>
+<li>Rebuild and install any modules so that upgraded masters/slaves can use 
them.</li>
+<li>Install the new master binaries and restart the masters.</li>
+<li>Install the new slave binaries and restart the slaves.</li>
+<li>Upgrade the schedulers by linking the latest native library / jar / egg 
(if necessary).</li>
+<li>Restart the schedulers.</li>
+<li>Upgrade the executors by linking the latest native library / jar / egg (if 
necessary).</li>
+</ul>
+
+
 <h2>Upgrading from 0.21.x to 0.22.x</h2>
 
 <p><strong>NOTE</strong> Slave checkpoint flag has been removed as it will be 
enabled for all


Reply via email to