Author: zznate
Date: Wed Apr 10 01:16:07 2019
New Revision: 1857225

URL: http://svn.apache.org/viewvc?rev=1857225&view=rev
Log:
CASSANDRA-14765 - Streaming performance post by Sumanth Pasupuleti

Added:
    cassandra/site/publish/blog/2019/
    cassandra/site/publish/blog/2019/04/
    cassandra/site/publish/blog/2019/04/09/
    cassandra/site/publish/blog/2019/04/09/benchmarking_streaming.html
    cassandra/site/publish/blog/page2/
    cassandra/site/publish/blog/page2/index.html
    cassandra/site/publish/img/blog-post-benchmarking-streaming/
    
cassandra/site/publish/img/blog-post-benchmarking-streaming/cassandra_streaming.png
   (with props)
    cassandra/site/src/img/blog-post-benchmarking-streaming/
    
cassandra/site/src/img/blog-post-benchmarking-streaming/cassandra_streaming.png 
  (with props)
Modified:
    cassandra/site/publish/blog/index.html
    cassandra/site/publish/feed.xml

Added: cassandra/site/publish/blog/2019/04/09/benchmarking_streaming.html
URL: 
http://svn.apache.org/viewvc/cassandra/site/publish/blog/2019/04/09/benchmarking_streaming.html?rev=1857225&view=auto
==============================================================================
--- cassandra/site/publish/blog/2019/04/09/benchmarking_streaming.html (added)
+++ cassandra/site/publish/blog/2019/04/09/benchmarking_streaming.html Wed Apr 
10 01:16:07 2019
@@ -0,0 +1,232 @@
+<!DOCTYPE html>
+<html>
+  
+
+
+
+<head>
+  <meta charset="utf-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="description" content="Streaming is a process where nodes of a 
cluster exchange data in the form of SSTables. Streaming can kick in during 
many situations such as bootstrap, repair...">
+  <meta name="keywords" content="cassandra, apache, apache cassandra, 
distributed storage, key value store, scalability, bigtable, dynamo" />
+  <meta name="robots" content="index,follow" />
+  <meta name="language" content="en" />  
+
+  <title>Even Higher Availability with 5x Faster Streaming in Cassandra 
4.0</title>
+
+  <link rel="canonical" 
href="http://cassandra.apache.org/blog/2019/04/09/benchmarking_streaming.html";>
+
+  <link rel="stylesheet" 
href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"; 
integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
 crossorigin="anonymous">
+  <link rel="stylesheet" href="./../../../../css/style.css">
+  
+
+  
+  <link rel="stylesheet" 
href="https://use.fontawesome.com/releases/v5.2.0/css/all.css"; 
integrity="sha384-hWVjflwFxL6sNzntih27bfxkr27PmbbK/iSvJ+a4+0owXq79v+lsFkW54bOGbiDQ"
 crossorigin="anonymous">
+  
+  <link type="application/atom+xml" rel="alternate" 
href="http://cassandra.apache.org/feed.xml"; title="Apache Cassandra Website" />
+</head>
+
+  <body>
+    <!-- breadcrumbs -->
+<div class="topnav">
+  <div class="container breadcrumb-container">
+    <ul class="breadcrumb">
+      <li>
+        <div class="dropdown">
+          <img class="asf-logo" src="./../../../../img/asf_feather.png" />
+          <a data-toggle="dropdown" href="#">Apache Software Foundation <span 
class="caret"></span></a>
+          <ul class="dropdown-menu" role="menu" aria-labelledby="dLabel">
+            <li><a href="http://www.apache.org";>Apache Homepage</a></li>
+            <li><a href="http://www.apache.org/licenses/";>License</a></li>
+            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Sponsorship</a></li>
+            <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+            <li><a href="http://www.apache.org/security/";>Security</a></li>
+          </ul>
+        </div>
+      </li>
+
+      
+      <li><a href="./../../../../">Apache Cassandra</a></li>
+      
+
+      
+        
+        <li>Even Higher Availability with 5x Faster Streaming in Cassandra 
4.0</li>
+        
+      
+
+      
+
+      
+    </ul>
+  </div>
+
+  <!-- navbar -->
+  <nav class="navbar navbar-default navbar-static-top" role="navigation">
+    <div class="container">
+      <div class="navbar-header">
+        <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#cassandra-menu" aria-expanded="false">
+          <span class="sr-only">Toggle navigation</span>
+          <span class="icon-bar"></span>
+          <span class="icon-bar"></span>
+          <span class="icon-bar"></span>
+        </button>
+        <a class="navbar-brand" href="./../../../../"><img 
src="./../../../../img/cassandra_logo.png" alt="Apache Cassandra logo" /></a>
+      </div><!-- /.navbar-header -->
+
+      <div id="cassandra-menu" class="collapse navbar-collapse">
+        <ul class="nav navbar-nav navbar-right">
+          <li><a href="./../../../../">Home</a></li>
+          <li><a href="./../../../../download/">Download</a></li>
+          <li><a href="./../../../../doc/">Documentation</a></li>
+          <li><a href="./../../../../community/">Community</a></li>
+          <li>
+            <a href="./../../../../blog">Blog</a>                    
+        </li>
+        </ul>
+      </div><!-- /#cassandra-menu -->
+
+      
+    </div>
+  </nav><!-- /.navbar -->
+</div><!-- /.topnav -->
+
+    <div class="content">
+  <div class="container">
+  <h2>Even Higher Availability with 5x Faster Streaming in Cassandra 4.0</h2>
+    <p>Posted on April 09, 2019 by The Apache Cassandra Community</p>
+    <h5><a href="/blog">&laquo; Back to the Apache Cassandra Blog</a></h5>
+    <hr />
+  <p>Streaming is a process where nodes of a cluster exchange data in the form 
of SSTables. Streaming can kick in during many situations such as bootstrap, 
repair, rebuild, range movement, cluster expansion, etc. In this post, we 
discuss the massive performance improvements made to the streaming process in 
Apache Cassandra 4.0.</p>
+
+<h2 id="high-availability">High Availability</h2>
+<p>As we know Cassandra is a Highly Available, Eventually Consistent database. 
The way it maintains its legendary availability is by storing redundant copies 
of data in nodes known as replicas, usually running on commodity hardware. 
During normal operations, these replicas may end up having hardware issues 
causing them to fail. As a result, we need to replace them with new nodes on 
fresh hardware.</p>
+
+<p>As part of this replacement operation, the new Cassandra node streams data 
from the neighboring nodes that hold copies of the data belonging to this new 
node’s token range. Depending on the amount of data stored, this process can 
require substantial network bandwidth, taking some time to complete. The longer 
these types of operations take, the more we are exposing ourselves to loss of 
availability. Depending on your replication factor and consistency 
requirements, if another node fails during this replacement operation, ability 
will be impacted.</p>
+
+<h2 id="increasing-availability">Increasing Availability</h2>
+<p>To minimize the failure window, we want to make these operations as fast as 
possible. The faster the new node completes streaming its data, the faster it 
can serve traffic, increasing the availability of the cluster. Towards this 
goal, Cassandra 4.0 saw the addition of <a 
href="https://en.wikipedia.org/wiki/Zero-copy";>Zero Copy</a> streaming. For 
more details on Cassandra’s zero copy implementation, see this <a 
href="../../../2018/08/06/faster_streaming_in_cassandra.html">blog post</a> and 
<a 
href="https://issues.apache.org/jira/browse/CASSANDRA-14556";>CASSANDRA-14556</a>
 for more information.</p>
+
+<h2 id="talking-numbers">Talking Numbers</h2>
+<p>To quantify the results of these improvements, we, at Netflix, measured the 
performance impact of streaming in 4.0 vs 3.0, using our open source <a 
href="https://github.com/Netflix/ndbench";>NDBench</a> benchmarking tool with 
the CassJavaDriverGeneric plugin. Though we knew there would be improvements, 
we were still amazed with the overall results of a <strong>five fold 
increase</strong> in streaming performance. The test setup and operations are 
all detailed below.</p>
+
+<h3 id="test-setup">Test Setup</h3>
+<p>In our test setup, we used the following configurations:</p>
+<ul>
+  <li>6-node clusters on i3.xl, i3.2xl, i3.4xl and i3.8xl EC2 instances, each 
on 3.0 and trunk (sha dd7ec5a2d6736b26d3c5f137388f2d0028df7a03).</li>
+  <li>Table schema</li>
+</ul>
+<div><pre>
+CREATE TABLE testing.test (
+    key text,
+    column1 int,
+    value text,
+    PRIMARY KEY (key, column1)
+) WITH CLUSTERING ORDER BY (column1 ASC)
+    AND bloom_filter_fp_chance = 0.01
+    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
+    AND comment = ''
+    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
+    AND compression = {'enabled': 'false'}
+    AND crc_check_chance = 1.0
+    AND dclocal_read_repair_chance = 0.1
+    AND default_time_to_live = 0
+    AND gc_grace_seconds = 864000
+    AND max_index_interval = 2048
+    AND memtable_flush_period_in_ms = 0
+    AND min_index_interval = 128
+    AND read_repair_chance = 0.0
+    AND speculative_retry = '99PERCENTILE';
+</pre></div>
+
+<ul>
+  <li>Data size per node: 500GB</li>
+  <li>No. of tokens per node: 1 (no vnodes)</li>
+</ul>
+
+<p>To trigger the streaming process we used the following steps in each of the 
clusters:</p>
+<ul>
+  <li>terminated a node</li>
+  <li>add a new node as a replacement</li>
+  <li>measure the time taken to complete streaming data by the new node 
replacing the terminated node</li>
+</ul>
+
+<p>For each cluster and version, we repeated this exercise multiple times to 
collect several samples.</p>
+
+<p>Below is the distribution of streaming times we found across the clusters
+<img src="/img/blog-post-benchmarking-streaming/cassandra_streaming.png" 
alt="Benchmark results" title="Benchmark results" /></p>
+
+<h3 id="interpreting-the-results">Interpreting the Results</h3>
+<p>Based on the graph above, there are many conclusions one can draw from it. 
Some of them are</p>
+<ul>
+  <li>3.0 streaming times are inconsistent and show high degree of variability 
(fat distributions across multiple samples)</li>
+  <li>3.0 streaming is highly affected by the instance type and generally 
looks generally CPU bound</li>
+  <li>Zero Copy streaming is approximately 5x faster</li>
+  <li>Zero Copy streaming time shows little variability in its performance 
(thin distributions across multiple samples)</li>
+  <li>Zero Copy streaming performance is not CPU bound and remains consistent 
across instance types</li>
+</ul>
+
+<p>It is clear from the performance test results that Zero Copy Streaming has 
a huge performance benefit over the current streaming infrastructure in 
Cassandra. But what does it mean in the real world? The following key points 
are the main take aways.</p>
+
+<p><strong>MTTR (Mean Time to Recovery):</strong> MTTR is a KPI (Key 
Performance Indicator) that is used to measure how quickly a system recovers 
from a failure. Zero Copy Streaming has a very direct impact here with a 
<strong>five fold improvement</strong> on performance.</p>
+
+<p><strong>Costs:</strong> Zero Copy Streaming is ~5x faster. This translates 
directly into cost for some organizations primarily as a result of reducing the 
need to maintain spare server or cloud capacity. In other situations where 
you’re migrating data to larger instance types or moving AZs or DCs, this 
means that instances that are sending data can be turned off sooner saving 
costs. An added cost benefit is that now you don’t have to over provision the 
instance. You get a similar streaming performance whether you use a i3.xl or an 
i3.8xl provided the bandwidth is available to the instance.</p>
+
+<p><strong>Risk Reduction:</strong> There is a great reduction in the risk due 
to Zero Copy Streaming as well. Since a Cluster’s recovery mainly depends on 
the streaming speed, Cassandra clusters with failed nodes will be able to 
recover much more quickly (5x faster). This means the window of vulnerability 
is reduced significantly, in some situations down to few minutes.</p>
+
+<p>Finally, a benefit that we generally don’t talk about is the 
environmental benefit of this change. Zero Copy Streaming enables us to move 
data very quickly through the cluster. It objectively reduces the number and 
sizes of instances that are used to build Cassandra cluster. As a result not 
only does it reduce Cassandra’s TCO (Total Cost of Ownership), it also helps 
the environment by consuming fewer resources!</p>
+
+  </div>
+</div>
+
+    <hr />
+
+<footer>
+  <div class="container">
+    <div class="col-md-4 social-blk">
+      <span class="social">
+        <a href="https://twitter.com/cassandra";
+           class="twitter-follow-button"
+           data-show-count="false" data-size="large">Follow @cassandra</a>
+        <script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document,
 'script', 'twitter-wjs');</script>
+        <a href="https://twitter.com/intent/tweet?button_hashtag=cassandra";
+           class="twitter-hashtag-button"
+           data-size="large"
+           data-related="ApacheCassandra">Tweet #cassandra</a>
+        <script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document,
 'script', 'twitter-wjs');</script>
+
+      </span>
+      <a class="subscribe-rss icon-link" href="/feed.xml" title="Subscribe to 
Blog via RSS">
+          <span><i class="fa fa-rss"></i></span>
+      </a>
+    </div>
+
+    <div class="col-md-8 trademark">
+      <p>&copy; 2016 <a href="http://apache.org";>The Apache Software 
Foundation</a>.
+      Apache, the Apache feather logo, and Apache Cassandra are trademarks of 
The Apache Software Foundation.
+      <p>
+    </div>
+  </div><!-- /.container -->
+</footer>
+
+<!-- Javascript. Placed here so pages load faster -->
+<script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js";></script>
+<script src="./../../../../js/underscore-min.js"></script>
+<script 
src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"; 
integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
 crossorigin="anonymous"></script>
+
+
+
+<script type="text/javascript">
+  var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl."; : 
"http://www.";);
+  document.write(unescape("%3Cscript src='" + gaJsHost + 
"google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
+
+  try {
+    var pageTracker = _gat._getTracker("UA-11583863-1");
+    pageTracker._trackPageview();
+  } catch(err) {}
+</script>
+
+
+  </body>
+</html>

Modified: cassandra/site/publish/blog/index.html
URL: 
http://svn.apache.org/viewvc/cassandra/site/publish/blog/index.html?rev=1857225&r1=1857224&r2=1857225&view=diff
==============================================================================
--- cassandra/site/publish/blog/index.html (original)
+++ cassandra/site/publish/blog/index.html Wed Apr 10 01:16:07 2019
@@ -102,6 +102,15 @@
     <ul class="blog-post-listing">
       
         <li class="blog-post">
+          <h4><a href="/blog/2019/04/09/benchmarking_streaming.html">Even 
Higher Availability with 5x Faster Streaming in Cassandra 4.0</a></h4>
+          <p>Posted on April 09, 2019 by The Apache Cassandra Community</p>
+          <p>Streaming is a process where nodes of a cluster exchange data in 
the form of SSTables. Streaming can kick in during many situations such as 
bootstrap, repair, rebuild, range movement, cluster expansion, etc. In this 
post, we discuss the massive performance improvements made to the streaming 
process in Apache Cassandra 4.0.</p>
+
+
+          <h5><a href="/blog/2019/04/09/benchmarking_streaming.html">Read more 
&raquo;</a></h5>
+        </li>
+      
+        <li class="blog-post">
           <h4><a 
href="/blog/2018/12/03/introducing-transient-replication.html">Introducing 
Transient Replication</a></h4>
           <p>Posted on December 03, 2018 by The Apache Cassandra Community</p>
           <p>Transient Replication is a new experimental feature soon to be 
available in 4.0. When enabled, it allows for the creation of keyspaces where 
replication factor can be specified as a number of copies (full replicas) and 
temporary copies (transient replicas). Transient replicas retain the data they 
replicate only long enough for it to be propagated to full replicas, via 
incremental repair, at which point the data is deleted. Writing to transient 
replicas can be avoided almost entirely if monotonic reads are not required 
because it is possible to achieve a quorum of acknowledged writes without 
them.</p>
@@ -140,17 +149,17 @@ to ensure compliance with regulatory, se
           <h5><a href="/blog/2018/08/21/testing_apache_cassandra.html">Read 
more &raquo;</a></h5>
         </li>
       
-        <li class="blog-post">
-          <h4><a 
href="/blog/2018/08/07/faster_streaming_in_cassandra.html">Hardware-bound Zero 
Copy Streaming in Apache Cassandra 4.0</a></h4>
-          <p>Posted on August 07, 2018 by The Apache Cassandra Community</p>
-          <p>Streaming in Apache Cassandra powers host replacement, range 
movements, and cluster expansions. Streaming plays a crucial role in the 
cluster and as such its performance is key to not only the speed of the 
operations its used in but the cluster’s health generally. In Apache 
Cassandra 4.0, we have introduced an improved streaming implementation that 
reduces GC pressure and increases throughput several folds and are now limited, 
in some cases, only by the disk / network IO (See: <a 
href="https://issues.apache.org/jira/browse/CASSANDRA-14556";>CASSANDRA-14556</a>).</p>
-
-
-          <h5><a 
href="/blog/2018/08/07/faster_streaming_in_cassandra.html">Read more 
&raquo;</a></h5>
-        </li>
-      
     </ul>
 
+  
+  <ul class="pager">
+    
+    
+    <li class="next">
+      <a href="/blog/page2/">Older Posts &rarr;</a>
+    </li>
+    
+  </ul>
       
   </div>
 </div>

Added: cassandra/site/publish/blog/page2/index.html
URL: 
http://svn.apache.org/viewvc/cassandra/site/publish/blog/page2/index.html?rev=1857225&view=auto
==============================================================================
--- cassandra/site/publish/blog/page2/index.html (added)
+++ cassandra/site/publish/blog/page2/index.html Wed Apr 10 01:16:07 2019
@@ -0,0 +1,177 @@
+<!DOCTYPE html>
+<html>
+  
+
+
+
+<head>
+  <meta charset="utf-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <meta name="description" content="The Apache Cassandra database is the right 
choice when you need scalability and high availability without compromising 
performance. Linear scalability and proven fault-tolerance on commodity 
hardware or cloud infrastructure make it the perfect platform for 
mission-critical data. Cassandra's support for replicating across multiple 
datacenters is best-in-class, providing lower latency for your users and the 
peace of mind of knowing that you can survive regional outages.
+">
+  <meta name="keywords" content="cassandra, apache, apache cassandra, 
distributed storage, key value store, scalability, bigtable, dynamo" />
+  <meta name="robots" content="index,follow" />
+  <meta name="language" content="en" />  
+
+  <title> - page 2</title>
+
+  <link rel="canonical" href="http://cassandra.apache.org/blog/page2/";>
+
+  <link rel="stylesheet" 
href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"; 
integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
 crossorigin="anonymous">
+  <link rel="stylesheet" href="./../../css/style.css">
+  
+
+  
+  <link rel="stylesheet" 
href="https://use.fontawesome.com/releases/v5.2.0/css/all.css"; 
integrity="sha384-hWVjflwFxL6sNzntih27bfxkr27PmbbK/iSvJ+a4+0owXq79v+lsFkW54bOGbiDQ"
 crossorigin="anonymous">
+  
+  <link type="application/atom+xml" rel="alternate" 
href="http://cassandra.apache.org/feed.xml"; title="Apache Cassandra Website" />
+</head>
+
+  <body>
+    <!-- breadcrumbs -->
+<div class="topnav">
+  <div class="container breadcrumb-container">
+    <ul class="breadcrumb">
+      <li>
+        <div class="dropdown">
+          <img class="asf-logo" src="./../../img/asf_feather.png" />
+          <a data-toggle="dropdown" href="#">Apache Software Foundation <span 
class="caret"></span></a>
+          <ul class="dropdown-menu" role="menu" aria-labelledby="dLabel">
+            <li><a href="http://www.apache.org";>Apache Homepage</a></li>
+            <li><a href="http://www.apache.org/licenses/";>License</a></li>
+            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Sponsorship</a></li>
+            <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+            <li><a href="http://www.apache.org/security/";>Security</a></li>
+          </ul>
+        </div>
+      </li>
+
+      
+      <li><a href="./../../">Apache Cassandra</a></li>
+      
+
+      
+        
+        <li> - page 2</li>
+        
+      
+
+      
+
+      
+    </ul>
+  </div>
+
+  <!-- navbar -->
+  <nav class="navbar navbar-default navbar-static-top" role="navigation">
+    <div class="container">
+      <div class="navbar-header">
+        <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#cassandra-menu" aria-expanded="false">
+          <span class="sr-only">Toggle navigation</span>
+          <span class="icon-bar"></span>
+          <span class="icon-bar"></span>
+          <span class="icon-bar"></span>
+        </button>
+        <a class="navbar-brand" href="./../../"><img 
src="./../../img/cassandra_logo.png" alt="Apache Cassandra logo" /></a>
+      </div><!-- /.navbar-header -->
+
+      <div id="cassandra-menu" class="collapse navbar-collapse">
+        <ul class="nav navbar-nav navbar-right">
+          <li><a href="./../../">Home</a></li>
+          <li><a href="./../../download/">Download</a></li>
+          <li><a href="./../../doc/">Documentation</a></li>
+          <li><a href="./../../community/">Community</a></li>
+          <li>
+            <a href="./../../blog">Blog</a>                    
+        </li>
+        </ul>
+      </div><!-- /#cassandra-menu -->
+
+      
+    </div>
+  </nav><!-- /.navbar -->
+</div><!-- /.topnav -->
+
+    <div class="content">
+  <div class="container">
+    <h2>Apache Cassandra Blog</h2>
+<p>Have something to share with the community? Let us know on the <a 
href="http://cassandra.apache.org/community/#mailing";>mailing list</a>!</p>
+
+
+    <ul class="blog-post-listing">
+      
+        <li class="blog-post">
+          <h4><a 
href="/blog/2018/08/07/faster_streaming_in_cassandra.html">Hardware-bound Zero 
Copy Streaming in Apache Cassandra 4.0</a></h4>
+          <p>Posted on August 07, 2018 by The Apache Cassandra Community</p>
+          <p>Streaming in Apache Cassandra powers host replacement, range 
movements, and cluster expansions. Streaming plays a crucial role in the 
cluster and as such its performance is key to not only the speed of the 
operations its used in but the cluster’s health generally. In Apache 
Cassandra 4.0, we have introduced an improved streaming implementation that 
reduces GC pressure and increases throughput several folds and are now limited, 
in some cases, only by the disk / network IO (See: <a 
href="https://issues.apache.org/jira/browse/CASSANDRA-14556";>CASSANDRA-14556</a>).</p>
+
+
+          <h5><a 
href="/blog/2018/08/07/faster_streaming_in_cassandra.html">Read more 
&raquo;</a></h5>
+        </li>
+      
+    </ul>
+
+  
+  <ul class="pager">
+    
+    <li class="previous">
+      <a href="/blog/">&larr; Newer Posts</a>
+    </li>
+    
+    
+  </ul>
+      
+  </div>
+</div>
+
+    <hr />
+
+<footer>
+  <div class="container">
+    <div class="col-md-4 social-blk">
+      <span class="social">
+        <a href="https://twitter.com/cassandra";
+           class="twitter-follow-button"
+           data-show-count="false" data-size="large">Follow @cassandra</a>
+        <script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document,
 'script', 'twitter-wjs');</script>
+        <a href="https://twitter.com/intent/tweet?button_hashtag=cassandra";
+           class="twitter-hashtag-button"
+           data-size="large"
+           data-related="ApacheCassandra">Tweet #cassandra</a>
+        <script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document,
 'script', 'twitter-wjs');</script>
+
+      </span>
+      <a class="subscribe-rss icon-link" href="/feed.xml" title="Subscribe to 
Blog via RSS">
+          <span><i class="fa fa-rss"></i></span>
+      </a>
+    </div>
+
+    <div class="col-md-8 trademark">
+      <p>&copy; 2016 <a href="http://apache.org";>The Apache Software 
Foundation</a>.
+      Apache, the Apache feather logo, and Apache Cassandra are trademarks of 
The Apache Software Foundation.
+      <p>
+    </div>
+  </div><!-- /.container -->
+</footer>
+
+<!-- Javascript. Placed here so pages load faster -->
+<script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js";></script>
+<script src="./../../js/underscore-min.js"></script>
+<script 
src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"; 
integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
 crossorigin="anonymous"></script>
+
+
+
+<script type="text/javascript">
+  var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl."; : 
"http://www.";);
+  document.write(unescape("%3Cscript src='" + gaJsHost + 
"google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
+
+  try {
+    var pageTracker = _gat._getTracker("UA-11583863-1");
+    pageTracker._trackPageview();
+  } catch(err) {}
+</script>
+
+
+  </body>
+</html>

Modified: cassandra/site/publish/feed.xml
URL: 
http://svn.apache.org/viewvc/cassandra/site/publish/feed.xml?rev=1857225&r1=1857224&r2=1857225&view=diff
==============================================================================
--- cassandra/site/publish/feed.xml (original)
+++ cassandra/site/publish/feed.xml Wed Apr 10 01:16:07 2019
@@ -1,5 +1,82 @@
-<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="3.4.3">Jekyll</generator><link 
href="http://cassandra.apache.org/feed.xml"; rel="self" 
type="application/atom+xml" /><link href="http://cassandra.apache.org/"; 
rel="alternate" type="text/html" 
/><updated>2019-03-13T12:32:18+13:00</updated><id>http://cassandra.apache.org/</id><title
 type="html">Apache Cassandra Website</title><subtitle>The Apache Cassandra 
database is the right choice when you need scalability and high availability 
without compromising performance. Linear scalability and proven fault-tolerance 
on commodity hardware or cloud infrastructure make it the perfect platform for 
mission-critical data. Cassandra's support for replicating across multiple 
datacenters is best-in-class, providing lower latency for your users and the 
peace of mind of knowing that you can survive regional outages.
-</subtitle><entry><title type="html">Introducing Transient 
Replication</title><link 
href="http://cassandra.apache.org/blog/2018/12/03/introducing-transient-replication.html";
 rel="alternate" type="text/html" title="Introducing Transient Replication" 
/><published>2018-12-03T21:00:00+13:00</published><updated>2018-12-03T21:00:00+13:00</updated><id>http://cassandra.apache.org/blog/2018/12/03/introducing-transient-replication</id><content
 type="html" 
xml:base="http://cassandra.apache.org/blog/2018/12/03/introducing-transient-replication.html";>&lt;p&gt;Transient
 Replication is a new experimental feature soon to be available in 4.0. When 
enabled, it allows for the creation of keyspaces where replication factor can 
be specified as a number of copies (full replicas) and temporary copies 
(transient replicas). Transient replicas retain the data they replicate only 
long enough for it to be propagated to full replicas, via incremental repair, 
at which point the data is deleted. Writing to transi
 ent replicas can be avoided almost entirely if monotonic reads are not 
required because it is possible to achieve a quorum of acknowledged writes 
without them.&lt;/p&gt;
+<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="3.4.3">Jekyll</generator><link 
href="http://cassandra.apache.org/feed.xml"; rel="self" 
type="application/atom+xml" /><link href="http://cassandra.apache.org/"; 
rel="alternate" type="text/html" 
/><updated>2019-04-10T13:13:00+12:00</updated><id>http://cassandra.apache.org/</id><title
 type="html">Apache Cassandra Website</title><subtitle>The Apache Cassandra 
database is the right choice when you need scalability and high availability 
without compromising performance. Linear scalability and proven fault-tolerance 
on commodity hardware or cloud infrastructure make it the perfect platform for 
mission-critical data. Cassandra's support for replicating across multiple 
datacenters is best-in-class, providing lower latency for your users and the 
peace of mind of knowing that you can survive regional outages.
+</subtitle><entry><title type="html">Even Higher Availability with 5x Faster 
Streaming in Cassandra 4.0</title><link 
href="http://cassandra.apache.org/blog/2019/04/09/benchmarking_streaming.html"; 
rel="alternate" type="text/html" title="Even Higher Availability with 5x Faster 
Streaming in Cassandra 4.0" 
/><published>2019-04-09T20:00:00+12:00</published><updated>2019-04-09T20:00:00+12:00</updated><id>http://cassandra.apache.org/blog/2019/04/09/benchmarking_streaming</id><content
 type="html" 
xml:base="http://cassandra.apache.org/blog/2019/04/09/benchmarking_streaming.html";>&lt;p&gt;Streaming
 is a process where nodes of a cluster exchange data in the form of SSTables. 
Streaming can kick in during many situations such as bootstrap, repair, 
rebuild, range movement, cluster expansion, etc. In this post, we discuss the 
massive performance improvements made to the streaming process in Apache 
Cassandra 4.0.&lt;/p&gt;
+
+&lt;h2 id=&quot;high-availability&quot;&gt;High Availability&lt;/h2&gt;
+&lt;p&gt;As we know Cassandra is a Highly Available, Eventually Consistent 
database. The way it maintains its legendary availability is by storing 
redundant copies of data in nodes known as replicas, usually running on 
commodity hardware. During normal operations, these replicas may end up having 
hardware issues causing them to fail. As a result, we need to replace them with 
new nodes on fresh hardware.&lt;/p&gt;
+
+&lt;p&gt;As part of this replacement operation, the new Cassandra node streams 
data from the neighboring nodes that hold copies of the data belonging to this 
new node’s token range. Depending on the amount of data stored, this process 
can require substantial network bandwidth, taking some time to complete. The 
longer these types of operations take, the more we are exposing ourselves to 
loss of availability. Depending on your replication factor and consistency 
requirements, if another node fails during this replacement operation, ability 
will be impacted.&lt;/p&gt;
+
+&lt;h2 id=&quot;increasing-availability&quot;&gt;Increasing 
Availability&lt;/h2&gt;
+&lt;p&gt;To minimize the failure window, we want to make these operations as 
fast as possible. The faster the new node completes streaming its data, the 
faster it can serve traffic, increasing the availability of the cluster. 
Towards this goal, Cassandra 4.0 saw the addition of &lt;a 
href=&quot;https://en.wikipedia.org/wiki/Zero-copy&quot;&gt;Zero Copy&lt;/a&gt; 
streaming. For more details on Cassandra’s zero copy implementation, see this 
&lt;a 
href=&quot;../../../2018/08/06/faster_streaming_in_cassandra.html&quot;&gt;blog 
post&lt;/a&gt; and &lt;a 
href=&quot;https://issues.apache.org/jira/browse/CASSANDRA-14556&quot;&gt;CASSANDRA-14556&lt;/a&gt;
 for more information.&lt;/p&gt;
+
+&lt;h2 id=&quot;talking-numbers&quot;&gt;Talking Numbers&lt;/h2&gt;
+&lt;p&gt;To quantify the results of these improvements, we, at Netflix, 
measured the performance impact of streaming in 4.0 vs 3.0, using our open 
source &lt;a 
href=&quot;https://github.com/Netflix/ndbench&quot;&gt;NDBench&lt;/a&gt; 
benchmarking tool with the CassJavaDriverGeneric plugin. Though we knew there 
would be improvements, we were still amazed with the overall results of a 
&lt;strong&gt;five fold increase&lt;/strong&gt; in streaming performance. The 
test setup and operations are all detailed below.&lt;/p&gt;
+
+&lt;h3 id=&quot;test-setup&quot;&gt;Test Setup&lt;/h3&gt;
+&lt;p&gt;In our test setup, we used the following configurations:&lt;/p&gt;
+&lt;ul&gt;
+  &lt;li&gt;6-node clusters on i3.xl, i3.2xl, i3.4xl and i3.8xl EC2 instances, 
each on 3.0 and trunk (sha dd7ec5a2d6736b26d3c5f137388f2d0028df7a03).&lt;/li&gt;
+  &lt;li&gt;Table schema&lt;/li&gt;
+&lt;/ul&gt;
+&lt;div&gt;&lt;pre&gt;
+CREATE TABLE testing.test (
+    key text,
+    column1 int,
+    value text,
+    PRIMARY KEY (key, column1)
+) WITH CLUSTERING ORDER BY (column1 ASC)
+    AND bloom_filter_fp_chance = 0.01
+    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
+    AND comment = ''
+    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
+    AND compression = {'enabled': 'false'}
+    AND crc_check_chance = 1.0
+    AND dclocal_read_repair_chance = 0.1
+    AND default_time_to_live = 0
+    AND gc_grace_seconds = 864000
+    AND max_index_interval = 2048
+    AND memtable_flush_period_in_ms = 0
+    AND min_index_interval = 128
+    AND read_repair_chance = 0.0
+    AND speculative_retry = '99PERCENTILE';
+&lt;/pre&gt;&lt;/div&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Data size per node: 500GB&lt;/li&gt;
+  &lt;li&gt;No. of tokens per node: 1 (no vnodes)&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;To trigger the streaming process we used the following steps in each 
of the clusters:&lt;/p&gt;
+&lt;ul&gt;
+  &lt;li&gt;terminated a node&lt;/li&gt;
+  &lt;li&gt;add a new node as a replacement&lt;/li&gt;
+  &lt;li&gt;measure the time taken to complete streaming data by the new node 
replacing the terminated node&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;For each cluster and version, we repeated this exercise multiple 
times to collect several samples.&lt;/p&gt;
+
+&lt;p&gt;Below is the distribution of streaming times we found across the 
clusters
+&lt;img 
src=&quot;/img/blog-post-benchmarking-streaming/cassandra_streaming.png&quot; 
alt=&quot;Benchmark results&quot; title=&quot;Benchmark results&quot; 
/&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;interpreting-the-results&quot;&gt;Interpreting the 
Results&lt;/h3&gt;
+&lt;p&gt;Based on the graph above, there are many conclusions one can draw 
from it. Some of them are&lt;/p&gt;
+&lt;ul&gt;
+  &lt;li&gt;3.0 streaming times are inconsistent and show high degree of 
variability (fat distributions across multiple samples)&lt;/li&gt;
+  &lt;li&gt;3.0 streaming is highly affected by the instance type and 
generally looks generally CPU bound&lt;/li&gt;
+  &lt;li&gt;Zero Copy streaming is approximately 5x faster&lt;/li&gt;
+  &lt;li&gt;Zero Copy streaming time shows little variability in its 
performance (thin distributions across multiple samples)&lt;/li&gt;
+  &lt;li&gt;Zero Copy streaming performance is not CPU bound and remains 
consistent across instance types&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;It is clear from the performance test results that Zero Copy 
Streaming has a huge performance benefit over the current streaming 
infrastructure in Cassandra. But what does it mean in the real world? The 
following key points are the main take aways.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;MTTR (Mean Time to Recovery):&lt;/strong&gt; MTTR is a 
KPI (Key Performance Indicator) that is used to measure how quickly a system 
recovers from a failure. Zero Copy Streaming has a very direct impact here with 
a &lt;strong&gt;five fold improvement&lt;/strong&gt; on performance.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Costs:&lt;/strong&gt; Zero Copy Streaming is ~5x 
faster. This translates directly into cost for some organizations primarily as 
a result of reducing the need to maintain spare server or cloud capacity. In 
other situations where you’re migrating data to larger instance types or 
moving AZs or DCs, this means that instances that are sending data can be 
turned off sooner saving costs. An added cost benefit is that now you don’t 
have to over provision the instance. You get a similar streaming performance 
whether you use a i3.xl or an i3.8xl provided the bandwidth is available to the 
instance.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Risk Reduction:&lt;/strong&gt; There is a great 
reduction in the risk due to Zero Copy Streaming as well. Since a Cluster’s 
recovery mainly depends on the streaming speed, Cassandra clusters with failed 
nodes will be able to recover much more quickly (5x faster). This means the 
window of vulnerability is reduced significantly, in some situations down to 
few minutes.&lt;/p&gt;
+
+&lt;p&gt;Finally, a benefit that we generally don’t talk about is the 
environmental benefit of this change. Zero Copy Streaming enables us to move 
data very quickly through the cluster. It objectively reduces the number and 
sizes of instances that are used to build Cassandra cluster. As a result not 
only does it reduce Cassandra’s TCO (Total Cost of Ownership), it also helps 
the environment by consuming fewer 
resources!&lt;/p&gt;</content><author><name>The Apache Cassandra 
Community</name></author><summary type="html">Streaming is a process where 
nodes of a cluster exchange data in the form of SSTables. Streaming can kick in 
during many situations such as bootstrap, repair, rebuild, range movement, 
cluster expansion, etc. In this post, we discuss the massive performance 
improvements made to the streaming process in Apache Cassandra 
4.0.</summary></entry><entry><title type="html">Introducing Transient 
Replication</title><link href="http://cassandra.apache.org/blog/2018/12/0
 3/introducing-transient-replication.html" rel="alternate" type="text/html" 
title="Introducing Transient Replication" 
/><published>2018-12-03T21:00:00+13:00</published><updated>2018-12-03T21:00:00+13:00</updated><id>http://cassandra.apache.org/blog/2018/12/03/introducing-transient-replication</id><content
 type="html" 
xml:base="http://cassandra.apache.org/blog/2018/12/03/introducing-transient-replication.html";>&lt;p&gt;Transient
 Replication is a new experimental feature soon to be available in 4.0. When 
enabled, it allows for the creation of keyspaces where replication factor can 
be specified as a number of copies (full replicas) and temporary copies 
(transient replicas). Transient replicas retain the data they replicate only 
long enough for it to be propagated to full replicas, via incremental repair, 
at which point the data is deleted. Writing to transient replicas can be 
avoided almost entirely if monotonic reads are not required because it is 
possible to achieve a quorum of acknow
 ledged writes without them.&lt;/p&gt;
 
 &lt;p&gt;This results in a savings in disk space, CPU, and IO. By deleting 
data as soon as it is no longer needed, transient replicas require only a 
fraction of the disk space of a full replica. By not having to store the data 
indefinitely, the CPU and IO required for compaction is reduced, and read 
queries are faster as they have less data to process.&lt;/p&gt;
 

Added: 
cassandra/site/publish/img/blog-post-benchmarking-streaming/cassandra_streaming.png
URL: 
http://svn.apache.org/viewvc/cassandra/site/publish/img/blog-post-benchmarking-streaming/cassandra_streaming.png?rev=1857225&view=auto
==============================================================================
Binary file - no diff available.

Propchange: 
cassandra/site/publish/img/blog-post-benchmarking-streaming/cassandra_streaming.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: 
cassandra/site/src/img/blog-post-benchmarking-streaming/cassandra_streaming.png
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/img/blog-post-benchmarking-streaming/cassandra_streaming.png?rev=1857225&view=auto
==============================================================================
Binary file - no diff available.

Propchange: 
cassandra/site/src/img/blog-post-benchmarking-streaming/cassandra_streaming.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to