Repository: mesos-site
Updated Branches:
  refs/heads/asf-site b3759174b -> 5abd8c347


Updated the website built from mesos SHA: 3bb33e7.


Project: http://git-wip-us.apache.org/repos/asf/mesos-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos-site/commit/5abd8c34
Tree: http://git-wip-us.apache.org/repos/asf/mesos-site/tree/5abd8c34
Diff: http://git-wip-us.apache.org/repos/asf/mesos-site/diff/5abd8c34

Branch: refs/heads/asf-site
Commit: 5abd8c34785745edd9dfbc779048dd0610db0993
Parents: b375917
Author: jenkins <[email protected]>
Authored: Mon Dec 11 20:55:06 2017 +0000
Committer: jenkins <[email protected]>
Committed: Mon Dec 11 20:55:06 2017 +0000

----------------------------------------------------------------------
 content/blog/feed.xml                           | 118 ++++++-
 content/blog/index.html                         |   5 +
 .../index.html                                  | 305 +++++++++++++++++++
 content/sitemap.xml                             |   8 +-
 4 files changed, 431 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos-site/blob/5abd8c34/content/blog/feed.xml
----------------------------------------------------------------------
diff --git a/content/blog/feed.xml b/content/blog/feed.xml
index c4ac763..9e40b17 100644
--- a/content/blog/feed.xml
+++ b/content/blog/feed.xml
@@ -4,7 +4,123 @@
   <id>http://mesos.apache.org/blog</id>
   <link href="http://mesos.apache.org/blog"; />
   <link href="http://mesos.apache.org/blog/feed.xml"; rel="self"/>
-  <updated>2017-11-16T00:00:00Z</updated>
+  <updated>2017-12-07T00:00:00Z</updated>
+  
+  <entry>
+    
<id>http://mesos.apache.org/blog/performance-working-group-progress-report/</id>
+    <link href="/blog/performance-working-group-progress-report/" />
+    <title>
+      December 2017 Performance Working Group Progress Report
+    </title>
+    <updated>2017-12-07T00:00:00Z</updated>
+    <author>
+      <name>Benjamin Mahler</name>
+    </author>
+    <content type="html">
+      &lt;p&gt;&lt;strong&gt;Scalability and performance are key features for 
Mesos. Some users of Mesos already run production clusters that consist of more 
than 35,000+ agents and 100,000+ active tasks.&lt;/strong&gt; However, there 
remains a lot of room for improvement across a variety of areas of the 
system.&lt;/p&gt;
+
+&lt;p&gt;The performance working group was created in order to focus on some 
of these areas. The group&amp;rsquo;s charter is to improve scalability / 
throughput / latency across the system, and in order to measure our 
improvements and prevent performance regressions we will write benchmarks and 
automate them.&lt;/p&gt;
+
+&lt;p&gt;In the past few months, we&amp;rsquo;ve focused on making 
improvements to the following areas:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;&lt;strong&gt;Master failover time-to-completion&lt;/strong&gt;: 
Achieved a 450-600% improvement in throughput, which reduces the 
time-to-completion by 80-85%.&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;&lt;a 
href=&quot;https://github.com/apache/mesos/tree/master/3rdparty/libprocess&quot;&gt;Libprocess&lt;/a&gt;
 message passing throughput&lt;/strong&gt;: These improvements will be covered 
in a separate blog post.&lt;/li&gt;
+&lt;/ul&gt;
+
+
+&lt;p&gt;Before we dive into the master failover improvements, I would like to 
recognize and thank the following contributors:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;&lt;strong&gt;Dmitry Zhuk&lt;/strong&gt;: for writing &lt;em&gt;a 
lot&lt;/em&gt; of patches for improving the master failover 
performance.&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;Michael Park&lt;/strong&gt;: for reviewing and 
shipping many of Dmitry&amp;rsquo;s more challenging patches.&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;Yan Xu&lt;/strong&gt;: for writing the master failover 
benchmark that was the basis for measuring the improvements.&lt;/li&gt;
+&lt;/ul&gt;
+
+
+&lt;h2&gt;Master Failover Time-To-Completion&lt;/h2&gt;
+
+&lt;p&gt;Our first area of focus was to improve the time it takes for a master 
failover to complete, where completion is defined as all of the agents 
successfully re-registering. Mesos is architected to use a centralized master 
with standby masters that participate in a quorum for high availability. For 
scalability reasons, the leading master stores the state of the cluster 
in-memory. During a master failover, the leading master needs to therefore 
re-build the in-memory state from all of the agents that re-register. During 
this time, the master is available to process other requests, but will be 
exposing only partial state to API consumers.&lt;/p&gt;
+
+&lt;p&gt;The rebuilding of the master’s in-memory state can be expensive for 
larger clusters, and so the focus of this effort was to improve the efficiency 
of this. Improvements were made via several areas, and only the highest-impact 
changes are listed below:&lt;/p&gt;
+
+&lt;h3&gt;Protobuf 3.5.0 Move Support&lt;/h3&gt;
+
+&lt;p&gt;We upgraded to protobuf 3.5.0 in order to gain move support. When we 
profiled the master, we found that it spent a lot of time copying protobuf 
messages during agent re-registration. This support allowed us to eliminate 
copies of protobuf messages while retaining value semantics.&lt;/p&gt;
+
+&lt;h3&gt;Move Support and Copy Elimination in Libprocess 
&lt;code&gt;dispatch&lt;/code&gt; / &lt;code&gt;defer&lt;/code&gt; / 
&lt;code&gt;install&lt;/code&gt;&lt;/h3&gt;
+
+&lt;p&gt;Libprocess provides several primitives for message passing:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;&lt;code&gt;dispatch&lt;/code&gt;: Provides the ability to post a 
messages to a local &lt;code&gt;Process&lt;/code&gt;&lt;/li&gt;
+&lt;li&gt;&lt;code&gt;defer&lt;/code&gt;: Provides a deferred 
&lt;code&gt;dispatch&lt;/code&gt;. i.e. a function object that when invoked 
will issue a &lt;code&gt;dispatch&lt;/code&gt;.&lt;/li&gt;
+&lt;li&gt;&lt;code&gt;install&lt;/code&gt;: Installs a handler for receiving a 
protobuf message.&lt;/li&gt;
+&lt;/ul&gt;
+
+
+&lt;p&gt;These primitives did not have move support, as they were originally 
added prior to the addition of C++11 support to the code-base. In order to 
eliminate copies, we enhanced these primitives to support moving arguments in 
and out.&lt;/p&gt;
+
+&lt;p&gt;This required introducing a new C++ utility, because 
&lt;code&gt;defer&lt;/code&gt; takes on the same API as 
&lt;code&gt;std::bind&lt;/code&gt; (e.g., placeholders). Specifically, the 
function object returned by &lt;code&gt;std::bind&lt;/code&gt; does not move 
the bound arguments into the stored callable. In order to enable this, 
&lt;code&gt;defer&lt;/code&gt; now uses a utility we introduced called 
&lt;code&gt;lambda::partial&lt;/code&gt; rather than 
&lt;code&gt;std::bind&lt;/code&gt;. &lt;code&gt;lambda::partial&lt;/code&gt; 
performs partial function application similar to 
&lt;code&gt;std::bind&lt;/code&gt; except the returned function object moves 
the bound arguments into the stored callable if the invocation is performed on 
an r-value function object.&lt;/p&gt;
+
+&lt;h3&gt;Copy Elimination in the Master&lt;/h3&gt;
+
+&lt;p&gt;With these previous enhancements in place, we were able to eliminate 
many of the expensive copies of protobuf messages performed by the 
master.&lt;/p&gt;
+
+&lt;h3&gt;Benchmark and Results&lt;/h3&gt;
+
+&lt;p&gt;We wrote a synthetic benchmark to simulate a master failover. This 
benchmark prepares all the messages that would be sent to the master by the 
agents that need to re-register:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;The benchmark uses synthetic agents in that they are just an actor 
that knows how to re-register with the master.&lt;/li&gt;
+&lt;li&gt;Each &amp;ldquo;agent&amp;rdquo; will send a configurable number of 
active and completed tasks belonging to a configurable number of active and 
completed frameworks.&lt;/li&gt;
+&lt;li&gt;Each task has 10 small labels to introduce metadata 
overhead.&lt;/li&gt;
+&lt;/ul&gt;
+
+
+&lt;p&gt;The benchmark has a few caveats:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;It does not use executors (this would show improved results over 
what is shown below, but for simplicity the benchmark omits them)&lt;/li&gt;
+&lt;li&gt;It uses local message passing, whereas a real cluster would be 
passing messages over HTTP.&lt;/li&gt;
+&lt;li&gt;It uses a quorum size of 1, so writes to the master’s registry 
occur only on single local log replica.&lt;/li&gt;
+&lt;li&gt;The synthetic agents do not retry their re-registration, whereas 
typically agents will retry with a backoff.&lt;/li&gt;
+&lt;/ul&gt;
+
+
+&lt;p&gt;This was tested on a 2015 Macbook Pro with 2.8 GHz Intel Core i7 
processor. Mesos was configured using: &lt;code&gt;Apple LLVM version 9.0.0 
(clang-900.0.38)&lt;/code&gt;, with &lt;code&gt;-O2&lt;/code&gt; enabled in 
1.5.0.&lt;/p&gt;
+
+&lt;p&gt;The first results represent a cluster with 10 active tasks per agent 
across 5 frameworks, with no completed tasks. The results from 1,000 - 40,000 
agents with 10,000 - 400,000 active tasks:&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/assets/img/documentation/1.3-1.5_master_failover_no_history.png&quot;
 alt=&quot;1.3 - 1.5 Master Failover without Task History Graph&quot; 
/&gt;&lt;/p&gt;
+
+&lt;p&gt;There was a reduction in the time-to-completion of ~80% due to a 
450-500% improvement in throughput across 1.3.0 to 1.5.0.&lt;/p&gt;
+
+&lt;p&gt;The second results add task history: each agent also now contains 100 
completed tasks across 5 completed frameworks. The results from 1,000 - 40,000 
agents with 10,000 - 400,000 active tasks and 100,000 - 4,000,000 completed 
tasks are shown below:&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/assets/img/documentation/1.3-1.5_master_failover_with_history&quot; 
alt=&quot;1.3 - 1.5 Master Failover with Task History Graph&quot; 
/&gt;&lt;/p&gt;
+
+&lt;p&gt;This represents a reduction in time-to-completion of ~85% due to a 
550-700% improvement in throughput across 1.3.0 to 1.5.0.&lt;/p&gt;
+
+&lt;h2&gt;Performance Working Group Roadmap&lt;/h2&gt;
+
+&lt;p&gt;We&amp;rsquo;re currently targeting the following areas for 
improvements:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;&lt;strong&gt;Performance of the v1 API&lt;/strong&gt;: Currently 
the v1 API can be significantly slower than the v0 API. We would like to reach 
parity, and ideally surpass the performance of the v0 API.
+
+&lt;ul&gt;
+&lt;li&gt;&lt;strong&gt;&lt;a 
href=&quot;https://github.com/apache/mesos/tree/master/3rdparty/libprocess&quot;&gt;Libprocess&lt;/a&gt;
 HTTP performance&lt;/strong&gt;: This will be undertaken as part of improving 
the v1 API performance, since it is HTTP-based.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;Master state API performance&lt;/strong&gt;: 
Currently, API queries of the master&amp;rsquo;s state are serviced by the same 
master actor that processes all of the messages from schedulers and agents. 
Since the query processing can block the master from processing other events, 
users need to be careful not to query the master excessively. In practice, the 
master gets queried quite heavily due to the presence of several tools that 
rely on the master&amp;rsquo;s state (e.g. DNS tooling, UIs, CLIs, etc) and so 
this is a critical problem for users. This effort will leverage the state 
streaming API to stream the state to a different actor that can serve the state 
API requests. This will ensure that expensive state queries do not affect the 
master&amp;rsquo;s ability to process events.&lt;/li&gt;
+&lt;/ul&gt;
+
+
+&lt;p&gt;If you are a user and would like to suggest some areas for 
performance improvement, please let us know by emailing &lt;a 
href=&quot;&amp;#109;&amp;#97;&amp;#x69;&amp;#108;&amp;#x74;&amp;#x6f;&amp;#x3a;&amp;#100;&amp;#101;&amp;#x76;&amp;#64;&amp;#x61;&amp;#112;&amp;#97;&amp;#99;&amp;#104;&amp;#101;&amp;#46;&amp;#x6d;&amp;#101;&amp;#x73;&amp;#x6f;&amp;#x73;&amp;#x2e;&amp;#111;&amp;#114;&amp;#x67;&quot;&gt;&amp;#x64;&amp;#x65;&amp;#x76;&amp;#x40;&amp;#x61;&amp;#112;&amp;#x61;&amp;#x63;&amp;#x68;&amp;#101;&amp;#x2e;&amp;#x6d;&amp;#101;&amp;#115;&amp;#x6f;&amp;#x73;&amp;#46;&amp;#x6f;&amp;#114;&amp;#x67;&lt;/a&gt;.&lt;/p&gt;
+
+       </content>
+  </entry>
   
   <entry>
     <id>http://mesos.apache.org/blog/mesos-1-4-1-released/</id>

http://git-wip-us.apache.org/repos/asf/mesos-site/blob/5abd8c34/content/blog/index.html
----------------------------------------------------------------------
diff --git a/content/blog/index.html b/content/blog/index.html
index 2fa5cb1..c774526 100644
--- a/content/blog/index.html
+++ b/content/blog/index.html
@@ -107,6 +107,11 @@
   <div class="col-md-9">
     
       <article>
+        <h2><a 
href="/blog/performance-working-group-progress-report/">December 2017 
Performance Working Group Progress Report</a></h2>
+      <p><em>Posted by Benjamin Mahler, December  7, 2017</em></p>
+      </article>
+    
+      <article>
         <h2><a href="/blog/mesos-1-4-1-released/">Apache Mesos 1.4.1 
Released</a></h2>
       <p><em>Posted by Kapil Arya, November 16, 2017</em></p>
       </article>

http://git-wip-us.apache.org/repos/asf/mesos-site/blob/5abd8c34/content/blog/performance-working-group-progress-report/index.html
----------------------------------------------------------------------
diff --git a/content/blog/performance-working-group-progress-report/index.html 
b/content/blog/performance-working-group-progress-report/index.html
new file mode 100644
index 0000000..5e2405c
--- /dev/null
+++ b/content/blog/performance-working-group-progress-report/index.html
@@ -0,0 +1,305 @@
+<!DOCTYPE html>
+<html>
+  <head>
+    <meta charset="utf-8">
+    <title>December 2017 Performance Working Group Progress Report</title>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+    <meta property="og:locale" content="en_US"/>
+    <meta property="og:type" content="website"/>
+    <meta property="og:title" content="Apache Mesos"/>
+    <meta property="og:site_name" content="Apache Mesos"/>
+    <meta property="og:url" content="http://mesos.apache.org/"/>
+    <meta property="og:image" 
content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/>
+    <meta property="og:description"
+          content="Apache Mesos abstracts resources away from machines,
+                   enabling fault-tolerant and elastic distributed systems
+                   to easily be built and run effectively."/>
+
+    <meta name="twitter:card" content="summary"/>
+    <meta name="twitter:site" content="@ApacheMesos"/>
+    <meta name="twitter:title" content="Apache Mesos"/>
+    <meta name="twitter:image" 
content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/>
+    <meta name="twitter:description"
+          content="Apache Mesos abstracts resources away from machines,
+                   enabling fault-tolerant and elastic distributed systems
+                   to easily be built and run effectively."/>
+
+    <link 
href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" 
rel="stylesheet">
+    <link rel="alternate" type="application/atom+xml" title="Apache Mesos 
Blog" href="/blog/feed.xml">
+    <link href="../../assets/css/main.css" media="screen" rel="stylesheet" 
type="text/css" />
+
+    
+
+    <!-- Google Analytics Magic -->
+    <script type="text/javascript">
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-20226872-1']);
+    _gaq.push(['_setDomainName', 'apache.org']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; 
ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 
'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; 
s.parentNode.insertBefore(ga, s);
+    })();
+    </script>
+    
+  </head>
+  <body>
+    <!-- magical breadcrumbs -->
+    <div class="topnav">
+      <div class="container">
+        <ul class="breadcrumb">
+          <li>
+            <div class="dropdown">
+              <a data-toggle="dropdown" href="#">Apache Software Foundation 
<span class="caret"></span></a>
+              <ul class="dropdown-menu" role="menu" aria-labelledby="dLabel">
+                <li><a href="http://www.apache.org";>Apache Homepage</a></li>
+                <li><a href="http://www.apache.org/licenses/";>License</a></li>
+                <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Sponsorship</a></li>
+                <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+                <li><a href="http://www.apache.org/security/";>Security</a></li>
+              </ul>
+            </div>
+          </li>
+
+          <li><a href="http://mesos.apache.org";>Apache Mesos</a></li>
+          
+          
+          <li><a href="/blog
+/">Blog
+</a></li>
+          
+          
+        </ul><!-- /.breadcrumb -->
+      </div><!-- /.container -->
+    </div><!-- /.topnav -->
+
+    <!-- navbar excitement -->
+<div class="navbar navbar-default navbar-static-top" role="navigation">
+  <div class="container">
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#mesos-menu" aria-expanded="false">
+      <span class="sr-only">Toggle navigation</span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      </button>
+      <a class="navbar-brand" href="/"><img src="/assets/img/mesos_logo.png" 
alt="Apache Mesos logo"/></a>
+    </div><!-- /.navbar-header -->
+
+    <div class="navbar-collapse collapse" id="mesos-menu">
+      <ul class="nav navbar-nav navbar-right">
+        <li><a href="/getting-started/">Getting Started</a></li>
+        <li><a href="/blog/">Blog</a></li>
+        <li><a href="/documentation/latest/">Documentation</a></li>
+        <li><a href="/downloads/">Downloads</a></li>
+        <li><a href="/community/">Community</a></li>
+      </ul>
+    </div><!-- /#mesos-menu -->
+  </div><!-- /.container -->
+</div><!-- /.navbar -->
+
+<div class="content">
+  <div class="container">
+    <div class="row">
+
+  <div class="col-md-3">
+    <div class="meta">
+      <span class="author">
+        
+          <img 
src="http://www.gravatar.com/avatar/fb43656d4d45f940160c3226c53309f5?s=80";
+               class="author_gravatar">
+        
+        <span class="author_contact">
+          <p><strong>Benjamin Mahler</strong></p>
+          
+            <p><a href="http://twitter.com/bmahler";>@bmahler</a></p>
+          
+        </span>
+      </span>
+      <p><em>Posted December  7, 2017</em></p>
+    </div>
+
+    <div class="share">
+      <span class="social-share-button"><a href="https://twitter.com/share"; 
class="twitter-share-button" data-via="apachemesos">Tweet</a></span>
+
+      <span><script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document,
 'script', 'twitter-wjs');</script></span>
+
+      <span><div class="g-plusone" data-size="medium"></div></span>
+
+      <!-- Place this tag after the last +1 button tag. -->
+      <script type="text/javascript">
+        (function() {
+        var po = document.createElement('script'); po.type = 
'text/javascript'; po.async = true;
+        po.src = 'https://apis.google.com/js/plusone.js';
+        var s = document.getElementsByTagName('script')[0]; 
s.parentNode.insertBefore(po, s);
+        })();
+      </script>
+
+      <script src="//platform.linkedin.com/in.js" type="text/javascript">
+       lang: en_US
+      </script>
+      <script type="IN/Share" data-counter="right"></script>
+    </div>
+  </div>
+
+  <div class="post col-md-9">
+    <h1>December 2017 Performance Working Group Progress Report</h1>
+
+    <p><strong>Scalability and performance are key features for Mesos. Some 
users of Mesos already run production clusters that consist of more than 
35,000+ agents and 100,000+ active tasks.</strong> However, there remains a lot 
of room for improvement across a variety of areas of the system.</p>
+
+<p>The performance working group was created in order to focus on some of 
these areas. The group&rsquo;s charter is to improve scalability / throughput / 
latency across the system, and in order to measure our improvements and prevent 
performance regressions we will write benchmarks and automate them.</p>
+
+<p>In the past few months, we&rsquo;ve focused on making improvements to the 
following areas:</p>
+
+<ul>
+<li><strong>Master failover time-to-completion</strong>: Achieved a 450-600% 
improvement in throughput, which reduces the time-to-completion by 80-85%.</li>
+<li><strong><a 
href="https://github.com/apache/mesos/tree/master/3rdparty/libprocess";>Libprocess</a>
 message passing throughput</strong>: These improvements will be covered in a 
separate blog post.</li>
+</ul>
+
+
+<p>Before we dive into the master failover improvements, I would like to 
recognize and thank the following contributors:</p>
+
+<ul>
+<li><strong>Dmitry Zhuk</strong>: for writing <em>a lot</em> of patches for 
improving the master failover performance.</li>
+<li><strong>Michael Park</strong>: for reviewing and shipping many of 
Dmitry&rsquo;s more challenging patches.</li>
+<li><strong>Yan Xu</strong>: for writing the master failover benchmark that 
was the basis for measuring the improvements.</li>
+</ul>
+
+
+<h2>Master Failover Time-To-Completion</h2>
+
+<p>Our first area of focus was to improve the time it takes for a master 
failover to complete, where completion is defined as all of the agents 
successfully re-registering. Mesos is architected to use a centralized master 
with standby masters that participate in a quorum for high availability. For 
scalability reasons, the leading master stores the state of the cluster 
in-memory. During a master failover, the leading master needs to therefore 
re-build the in-memory state from all of the agents that re-register. During 
this time, the master is available to process other requests, but will be 
exposing only partial state to API consumers.</p>
+
+<p>The rebuilding of the master’s in-memory state can be expensive for 
larger clusters, and so the focus of this effort was to improve the efficiency 
of this. Improvements were made via several areas, and only the highest-impact 
changes are listed below:</p>
+
+<h3>Protobuf 3.5.0 Move Support</h3>
+
+<p>We upgraded to protobuf 3.5.0 in order to gain move support. When we 
profiled the master, we found that it spent a lot of time copying protobuf 
messages during agent re-registration. This support allowed us to eliminate 
copies of protobuf messages while retaining value semantics.</p>
+
+<h3>Move Support and Copy Elimination in Libprocess <code>dispatch</code> / 
<code>defer</code> / <code>install</code></h3>
+
+<p>Libprocess provides several primitives for message passing:</p>
+
+<ul>
+<li><code>dispatch</code>: Provides the ability to post a messages to a local 
<code>Process</code></li>
+<li><code>defer</code>: Provides a deferred <code>dispatch</code>. i.e. a 
function object that when invoked will issue a <code>dispatch</code>.</li>
+<li><code>install</code>: Installs a handler for receiving a protobuf 
message.</li>
+</ul>
+
+
+<p>These primitives did not have move support, as they were originally added 
prior to the addition of C++11 support to the code-base. In order to eliminate 
copies, we enhanced these primitives to support moving arguments in and out.</p>
+
+<p>This required introducing a new C++ utility, because <code>defer</code> 
takes on the same API as <code>std::bind</code> (e.g., placeholders). 
Specifically, the function object returned by <code>std::bind</code> does not 
move the bound arguments into the stored callable. In order to enable this, 
<code>defer</code> now uses a utility we introduced called 
<code>lambda::partial</code> rather than <code>std::bind</code>. 
<code>lambda::partial</code> performs partial function application similar to 
<code>std::bind</code> except the returned function object moves the bound 
arguments into the stored callable if the invocation is performed on an r-value 
function object.</p>
+
+<h3>Copy Elimination in the Master</h3>
+
+<p>With these previous enhancements in place, we were able to eliminate many 
of the expensive copies of protobuf messages performed by the master.</p>
+
+<h3>Benchmark and Results</h3>
+
+<p>We wrote a synthetic benchmark to simulate a master failover. This 
benchmark prepares all the messages that would be sent to the master by the 
agents that need to re-register:</p>
+
+<ul>
+<li>The benchmark uses synthetic agents in that they are just an actor that 
knows how to re-register with the master.</li>
+<li>Each &ldquo;agent&rdquo; will send a configurable number of active and 
completed tasks belonging to a configurable number of active and completed 
frameworks.</li>
+<li>Each task has 10 small labels to introduce metadata overhead.</li>
+</ul>
+
+
+<p>The benchmark has a few caveats:</p>
+
+<ul>
+<li>It does not use executors (this would show improved results over what is 
shown below, but for simplicity the benchmark omits them)</li>
+<li>It uses local message passing, whereas a real cluster would be passing 
messages over HTTP.</li>
+<li>It uses a quorum size of 1, so writes to the master’s registry occur 
only on single local log replica.</li>
+<li>The synthetic agents do not retry their re-registration, whereas typically 
agents will retry with a backoff.</li>
+</ul>
+
+
+<p>This was tested on a 2015 Macbook Pro with 2.8 GHz Intel Core i7 processor. 
Mesos was configured using: <code>Apple LLVM version 9.0.0 
(clang-900.0.38)</code>, with <code>-O2</code> enabled in 1.5.0.</p>
+
+<p>The first results represent a cluster with 10 active tasks per agent across 
5 frameworks, with no completed tasks. The results from 1,000 - 40,000 agents 
with 10,000 - 400,000 active tasks:</p>
+
+<p><img src="/assets/img/documentation/1.3-1.5_master_failover_no_history.png" 
alt="1.3 - 1.5 Master Failover without Task History Graph" /></p>
+
+<p>There was a reduction in the time-to-completion of ~80% due to a 450-500% 
improvement in throughput across 1.3.0 to 1.5.0.</p>
+
+<p>The second results add task history: each agent also now contains 100 
completed tasks across 5 completed frameworks. The results from 1,000 - 40,000 
agents with 10,000 - 400,000 active tasks and 100,000 - 4,000,000 completed 
tasks are shown below:</p>
+
+<p><img src="/assets/img/documentation/1.3-1.5_master_failover_with_history" 
alt="1.3 - 1.5 Master Failover with Task History Graph" /></p>
+
+<p>This represents a reduction in time-to-completion of ~85% due to a 550-700% 
improvement in throughput across 1.3.0 to 1.5.0.</p>
+
+<h2>Performance Working Group Roadmap</h2>
+
+<p>We&rsquo;re currently targeting the following areas for improvements:</p>
+
+<ul>
+<li><strong>Performance of the v1 API</strong>: Currently the v1 API can be 
significantly slower than the v0 API. We would like to reach parity, and 
ideally surpass the performance of the v0 API.
+
+<ul>
+<li><strong><a 
href="https://github.com/apache/mesos/tree/master/3rdparty/libprocess";>Libprocess</a>
 HTTP performance</strong>: This will be undertaken as part of improving the v1 
API performance, since it is HTTP-based.</li>
+</ul>
+</li>
+<li><strong>Master state API performance</strong>: Currently, API queries of 
the master&rsquo;s state are serviced by the same master actor that processes 
all of the messages from schedulers and agents. Since the query processing can 
block the master from processing other events, users need to be careful not to 
query the master excessively. In practice, the master gets queried quite 
heavily due to the presence of several tools that rely on the master&rsquo;s 
state (e.g. DNS tooling, UIs, CLIs, etc) and so this is a critical problem for 
users. This effort will leverage the state streaming API to stream the state to 
a different actor that can serve the state API requests. This will ensure that 
expensive state queries do not affect the master&rsquo;s ability to process 
events.</li>
+</ul>
+
+
+<p>If you are a user and would like to suggest some areas for performance 
improvement, please let us know by emailing <a 
href="&#109;&#97;&#x69;&#108;&#x74;&#x6f;&#x3a;&#100;&#101;&#x76;&#64;&#x61;&#112;&#97;&#99;&#104;&#101;&#46;&#x6d;&#101;&#x73;&#x6f;&#x73;&#x2e;&#111;&#114;&#x67;">&#x64;&#x65;&#x76;&#x40;&#x61;&#112;&#x61;&#x63;&#x68;&#101;&#x2e;&#x6d;&#101;&#115;&#x6f;&#x73;&#46;&#x6f;&#114;&#x67;</a>.</p>
+
+  </div>
+</div>
+
+  </div><!-- /.container -->
+</div><!-- /.content -->
+
+<hr>
+
+
+
+    <!-- footer -->
+    <div class="footer">
+      <div class="container">
+        <div class="col-md-4 social-blk">
+          <span class="social">
+            <a href="https://twitter.com/ApacheMesos";
+              class="twitter-follow-button"
+              data-show-count="false" data-size="large">Follow @ApacheMesos</a>
+            <script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document,
 'script', 'twitter-wjs');</script>
+            <a href="https://twitter.com/intent/tweet?button_hashtag=mesos";
+              class="twitter-hashtag-button"
+              data-size="large"
+              data-related="ApacheMesos">Tweet #mesos</a>
+            <script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document,
 'script', 'twitter-wjs');</script>
+          </span>
+        </div>
+
+        <div class="col-md-8 trademark">
+          <p>&copy; 2012-2017 <a href="http://apache.org";>The Apache Software 
Foundation</a>.
+            Apache Mesos, the Apache feather logo, and the Apache Mesos 
project logo are trademarks of The Apache Software Foundation.
+          <p>
+        </div>
+      </div><!-- /.container -->
+    </div><!-- /.footer -->
+
+    <!-- JS -->
+    <script src="//code.jquery.com/jquery-1.11.0.min.js" 
type="text/javascript"></script>
+    <script 
src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js" 
type="text/javascript"></script>
+    <script 
src="//cdnjs.cloudflare.com/ajax/libs/anchor-js/4.1.0/anchor.min.js" 
type="text/javascript"></script>
+
+    <!-- Inject anchors for all headings on the page, see 
https://www.bryanbraun.com/anchorjs. -->
+    <script type="text/javascript">
+    anchors.options = {
+      placement: 'right',
+      ariaLabel: 'Permalink',
+    };
+
+    // The default is to not add anchors to h1, but we have pages with 
multiple h1 headers,
+    // and we do want to put anchors on those.
+    anchors.add('h1, h2, h3, h4, h5, h6');
+    </script>
+  </body>
+</html>

http://git-wip-us.apache.org/repos/asf/mesos-site/blob/5abd8c34/content/sitemap.xml
----------------------------------------------------------------------
diff --git a/content/sitemap.xml b/content/sitemap.xml
index 0357a2e..f445b24 100644
--- a/content/sitemap.xml
+++ b/content/sitemap.xml
@@ -17305,6 +17305,10 @@
     <lastmod>2017-12-11T00:00:00+00:00</lastmod>
   </url>
   <url>
+    
<loc>http://mesos.apache.org/blog/performance-working-group-progress-report/</loc>
+    <lastmod>2017-12-11T00:00:00+00:00</lastmod>
+  </url>
+  <url>
     <loc>http://mesos.apache.org/blog/mesos-0-26-1-and-more-released/</loc>
     <lastmod>2017-12-11T00:00:00+00:00</lastmod>
   </url>
@@ -17393,10 +17397,6 @@
     <lastmod>2017-12-11T00:00:00+00:00</lastmod>
   </url>
   <url>
-    
<loc>http://mesos.apache.org/blog/2017-12-7-performance-working-group-progress-report/</loc>
-    <lastmod>2017-12-11T00:00:00+00:00</lastmod>
-  </url>
-  <url>
     <loc>http://mesos.apache.org/blog/mesos-1-2-0-released/</loc>
     <lastmod>2017-12-11T00:00:00+00:00</lastmod>
   </url>

Reply via email to