[2/4] kudu-site git commit: Publish commit(s) from site source repo: 83530755d Blogpost describing index skip scan optimization.

mpercy Wed, 26 Sep 2018 10:56:45 -0700

http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/blog/page/4/index.html
----------------------------------------------------------------------
diff --git a/blog/page/4/index.html b/blog/page/4/index.html
index 21724ba..3f6dd71 100644
--- a/blog/page/4/index.html
+++ b/blog/page/4/index.html
@@ -117,6 +117,27 @@
 <!-- Articles -->
 <article>
   <header>
+    <h1 class="entry-title"><a href="/2016/10/20/weekly-update.html">Apache 
Kudu Weekly Update October 20th, 2016</a></h1>
+    <p class="meta">Posted 20 Oct 2016 by Todd Lipcon</p>
+  </header>
+  <div class="entry-content">
+    
+    <p>Welcome to the twenty-second edition of the Kudu Weekly Update. This 
weekly blog post
+covers ongoing development and news in the Apache Kudu project.</p>
+
+
+    
+  </div>
+  <div class="read-full">
+    <a class="btn btn-info" href="/2016/10/20/weekly-update.html">Read full 
post...</a>
+  </div>
+</article>
+
+
+
+<!-- Articles -->
+<article>
+  <header>
     <h1 class="entry-title"><a href="/2016/10/11/weekly-update.html">Apache 
Kudu Weekly Update October 11th, 2016</a></h1>
     <p class="meta">Posted 11 Oct 2016 by Todd Lipcon</p>
   </header>
@@ -209,320 +230,6 @@ scan path to speed up queries.</p>
 
 
 
-<!-- Articles -->
-<article>
-  <header>
-    <h1 class="entry-title"><a 
href="/2016/08/31/intro-flume-kudu-sink.html">An Introduction to the Flume Kudu 
Sink</a></h1>
-    <p class="meta">Posted 31 Aug 2016 by Ara Abrahamian</p>
-  </header>
-  <div class="entry-content">
-    
-    <p>This post discusses the Kudu Flume Sink. First, Iâll give some 
background on why we considered
-using Kudu, what Flume does for us, and how Flume fits with Kudu in our 
project.</p>
-
-<h2 id="why-kudu">Why Kudu</h2>
-
-<p>Traditionally in the Hadoop ecosystem weâve dealt with various <em>batch 
processing</em> technologies such
-as MapReduce and the many libraries and tools built on top of it in various 
languages (Apache Pig,
-Apache Hive, Apache Oozie and many others). The main problem with this 
approach is that it needs to
-process the whole data set in batches, again and again, as soon as new data 
gets added. Things get
-really complicated when a few such tasks need to get chained together, or when 
the same data set
-needs to be processed in various ways by different jobs, while all compete for 
the shared cluster
-resources.</p>
-
-<p>The opposite of this approach is <em>stream processing</em>: process the 
data as soon as it arrives, not
-in batches. Streaming systems such as Spark Streaming, Storm, Kafka Streams, 
and many others make
-this possible. But writing streaming services is not trivial. The streaming 
systems are becoming
-more and more capable and support more complex constructs, but they are not 
yet easy to use. All
-queries and processes need to be carefully planned and implemented.</p>
-
-<p>To summarize, <em>batch processing</em> is:</p>
-
-<ul>
-  <li>file-based</li>
-  <li>a paradigm that processes large chunks of data as a group</li>
-  <li>high latency and high throughput, both for ingest and query</li>
-  <li>typically easy to program, but hard to orchestrate</li>
-  <li>well suited for writing ad-hoc queries, although they are typically high 
latency</li>
-</ul>
-
-<p>While <em>stream processing</em> is:</p>
-
-<ul>
-  <li>a totally different paradigm, which involves single events and time 
windows instead of large groups of events</li>
-  <li>still file-based and not a long-term database</li>
-  <li>not batch-oriented, but incremental</li>
-  <li>ultra-fast ingest and ultra-fast query (query results basically 
pre-calculated)</li>
-  <li>not so easy to program, relatively easy to orchestrate</li>
-  <li>impossible to write ad-hoc queries</li>
-</ul>
-
-<p>And a Kudu-based <em>near real-time</em> approach is:</p>
-
-<ul>
-  <li>flexible and expressive, thanks to SQL support via Apache Impala 
(incubating)</li>
-  <li>a table-oriented, mutable data store that feels like a traditional 
relational database</li>
-  <li>very easy to program, you can even pretend itâs good old MySQL</li>
-  <li>low-latency and relatively high throughput, both for ingest and 
query</li>
-</ul>
-
-<p>At Argyle Data, weâre dealing with complex fraud detection scenarios. We 
need to ingest massive
-amounts of data, run machine learning algorithms and generate reports. When we 
created our current
-architecture two years ago we decided to opt for a database as the backbone of 
our system. That
-database is Apache Accumulo. Itâs a key-value based database which runs on 
top of Hadoop HDFS,
-quite similar to HBase but with some important improvements such as cell level 
security and ease
-of deployment and management. To enable querying of this data for quite 
complex reporting and
-analytics, we used Presto, a distributed query engine with a pluggable 
architecture open-sourced
-by Facebook. We wrote a connector for it to let it run queries against the 
Accumulo database. This
-architecture has served us well, but there were a few problems:</p>
-
-<ul>
-  <li>we need to ingest even more massive volumes of data in real-time</li>
-  <li>we need to perform complex machine-learning calculations on even larger 
data-sets</li>
-  <li>we need to support ad-hoc queries, plus long-term data warehouse 
functionality</li>
-</ul>
-
-<p>So, weâve started gradually moving the core machine-learning pipeline to 
a streaming based
-solution. This way we can ingest and process larger data-sets faster in the 
real-time. But then how
-would we take care of ad-hoc queries and long-term persistence? This is where 
Kudu comes in. While
-the machine learning pipeline ingests and processes real-time data, we store a 
copy of the same
-ingested data in Kudu for long-term access and ad-hoc queries. Kudu is our 
<em>data warehouse</em>. By
-using Kudu and Impala, we can retire our in-house Presto connector and rely on 
Impalaâs
-super-fast query engine.</p>
-
-<p>But how would we make sure data is reliably ingested into the streaming 
pipeline <em>and</em> the
-Kudu-based data warehouse? This is where Apache Flume comes in.</p>
-
-<h2 id="why-flume">Why Flume</h2>
-
-<p>According to their <a href="http://flume.apache.org/";>website</a> âFlume 
is a distributed, reliable, and
-available service for efficiently collecting, aggregating, and moving large 
amounts of log data.
-It has a simple and flexible architecture based on streaming data flows. It is 
robust and fault
-tolerant with tunable reliability mechanisms and many failover and recovery 
mechanisms.â As you
-can see, nowhere is Hadoop mentioned but Flume is typically used for ingesting 
data to Hadoop
-clusters.</p>
-
-<p><img 
src="https://blogs.apache.org/flume/mediaresource/ab0d50f6-a960-42cc-971e-3da38ba3adad";
 alt="png" /></p>
-
-<p>Flume has an extensible architecture. An instance of Flume, called an 
<em>agent</em>, can have multiple
-<em>channels</em>, with each having multiple <em>sources</em> and 
<em>sinks</em> of various types. Sources queue data
-in channels, which in turn write out data to sinks. Such <em>pipelines</em> 
can be chained together to
-create even more complex ones. There may be more than one agent and agents can 
be configured to
-support failover and recovery.</p>
-
-<p>Flume comes with a bunch of built-in types of channels, sources and sinks. 
Memory channel is the
-default (an in-memory queue with no persistence to disk), but other options 
such as Kafka- and
-File-based channels are also provided. As for the sources, Avro, JMS, Thrift, 
spooling directory
-source are some of the built-in ones. Flume also ships with many sinks, 
including sinks for writing
-data to HDFS, HBase, Hive, Kafka, as well as to other Flume agents.</p>
-
-<p>In the rest of this post Iâll go over the Kudu Flume sink and show you 
how to configure Flume to
-write ingested data to a Kudu table. The sink has been part of the Kudu 
distribution since the 0.8
-release and the source code can be found <a 
href="https://github.com/apache/kudu/tree/master/java/kudu-flume-sink";>here</a>.</p>
-
-<h2 id="configuring-the-kudu-flume-sink">Configuring the Kudu Flume Sink</h2>
-
-<p>Here is a sample flume configuration file:</p>
-
-<pre><code>agent1.sources  = source1
-agent1.channels = channel1
-agent1.sinks = sink1
-
-agent1.sources.source1.type = exec
-agent1.sources.source1.command = /usr/bin/vmstat 1
-agent1.sources.source1.channels = channel1
-
-agent1.channels.channel1.type = memory
-agent1.channels.channel1.capacity = 10000
-agent1.channels.channel1.transactionCapacity = 1000
-
-agent1.sinks.sink1.type = org.apache.flume.sink.kudu.KuduSink
-agent1.sinks.sink1.masterAddresses = localhost
-agent1.sinks.sink1.tableName = stats
-agent1.sinks.sink1.channel = channel1
-agent1.sinks.sink1.batchSize = 50
-agent1.sinks.sink1.producer = 
org.apache.kudu.flume.sink.SimpleKuduEventProducer
-</code></pre>
-
-<p>We define a source called <code>source1</code> which simply executes a 
<code>vmstat</code> command to continuously generate
-virtual memory statistics for the machine and queue events into an in-memory 
<code>channel1</code> channel,
-which in turn is used for writing these events to a Kudu table called 
<code>stats</code>. We are using
-<code>org.apache.kudu.flume.sink.SimpleKuduEventProducer</code> as the 
producer. <code>SimpleKuduEventProducer</code> is
-the built-in and default producer, but itâs implemented as a showcase for 
how to write Flume
-events into Kudu tables. For any serious functionality weâd have to write a 
custom producer. We
-need to make this producer and the <code>KuduSink</code> class available to 
Flume. We can do that by simply
-copying the <code>kudu-flume-sink-&lt;VERSION&gt;.jar</code> jar file from the 
Kudu distribution to the
-<code>$FLUME_HOME/plugins.d/kudu-sink/lib</code> directory in the Flume 
installation. The jar file contains
-<code>KuduSink</code> and all of its dependencies (including Kudu java client 
classes).</p>
-
-<p>At a minimum, the Kudu Flume Sink needs to know where the Kudu masters are
-(<code>agent1.sinks.sink1.masterAddresses = localhost</code>) and which Kudu 
table should be used for writing
-Flume events to (<code>agent1.sinks.sink1.tableName = stats</code>). The Kudu 
Flume Sink doesnât create this
-table, it has to be created before the Kudu Flume Sink is started.</p>
-
-<p>You may also notice the <code>batchSize</code> parameter. Batch size is 
used for batching up to that many
-Flume events and flushing the entire batch in one shot. Tuning batchSize 
properly can have a huge
-impact on ingest performance of the Kudu cluster.</p>
-
-<p>Here is a complete list of KuduSink parameters:</p>
-
-<table>
-  <thead>
-    <tr>
-      <th>Parameter Name</th>
-      <th>Default</th>
-      <th>Description</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td>masterAddresses</td>
-      <td>N/A</td>
-      <td>Comma-separated list of âhost:portâ pairs of the masters (port 
optional)</td>
-    </tr>
-    <tr>
-      <td>tableName</td>
-      <td>N/A</td>
-      <td>The name of the table in Kudu to write to</td>
-    </tr>
-    <tr>
-      <td>producer</td>
-      <td>org.apache.kudu.flume.sink.SimpleKuduEventProducer</td>
-      <td>The fully qualified class name of the Kudu event producer the sink 
should use</td>
-    </tr>
-    <tr>
-      <td>batchSize</td>
-      <td>100</td>
-      <td>Maximum number of events the sink should take from the channel per 
transaction, if available</td>
-    </tr>
-    <tr>
-      <td>timeoutMillis</td>
-      <td>30000</td>
-      <td>Timeout period for Kudu operations, in milliseconds</td>
-    </tr>
-    <tr>
-      <td>ignoreDuplicateRows</td>
-      <td>true</td>
-      <td>Whether to ignore errors indicating that we attempted to insert 
duplicate rows into Kudu</td>
-    </tr>
-  </tbody>
-</table>
-
-<p>Letâs take a look at the source code for the built-in producer class:</p>
-
-<pre><code class="language-java">public class SimpleKuduEventProducer 
implements KuduEventProducer {
-  private byte[] payload;
-  private KuduTable table;
-  private String payloadColumn;
-
-  public SimpleKuduEventProducer(){
-  }
-
-  @Override
-  public void configure(Context context) {
-    payloadColumn = context.getString("payloadColumn","payload");
-  }
-
-  @Override
-  public void configure(ComponentConfiguration conf) {
-  }
-
-  @Override
-  public void initialize(Event event, KuduTable table) {
-    this.payload = event.getBody();
-    this.table = table;
-  }
-
-  @Override
-  public List&lt;Operation&gt; getOperations() throws FlumeException {
-    try {
-      Insert insert = table.newInsert();
-      PartialRow row = insert.getRow();
-      row.addBinary(payloadColumn, payload);
-
-      return Collections.singletonList((Operation) insert);
-    } catch (Exception e){
-      throw new FlumeException("Failed to create Kudu Insert object!", e);
-    }
-  }
-
-  @Override
-  public void close() {
-  }
-}
-</code></pre>
-
-<p><code>SimpleKuduEventProducer</code> implements the 
<code>org.apache.kudu.flume.sink.KuduEventProducer</code> interface,
-which itself looks like this:</p>
-
-<pre><code class="language-java">public interface KuduEventProducer extends 
Configurable, ConfigurableComponent {
-  /**
-   * Initialize the event producer.
-   * @param event to be written to Kudu
-   * @param table the KuduTable object used for creating Kudu Operation objects
-   */
-  void initialize(Event event, KuduTable table);
-
-  /**
-   * Get the operations that should be written out to Kudu as a result of this
-   * event. This list is written to Kudu using the Kudu client API.
-   * @return List of {@link org.kududb.client.Operation} which
-   * are written as such to Kudu
-   */
-  List&lt;Operation&gt; getOperations();
-
-  /*
-   * Clean up any state. This will be called when the sink is being stopped.
-   */
-  void close();
-}
-</code></pre>
-
-<p><code>public void configure(Context context)</code> is called when an 
instance of our producer is instantiated
-by the KuduSink. SimpleKuduEventProducerâs implementation looks for a 
producer parameter named
-<code>payloadColumn</code> and uses its value (âpayloadâ if not overridden 
in Flume configuration file) as the
-column which will hold the value of the Flume event payload. If you recall 
from above, we had
-configured the KuduSink to listen for events generated from the 
<code>vmstat</code> command. Each output row
-from that command will be stored as a new row containing a 
<code>payload</code> column in the <code>stats</code> table.
-<code>SimpleKuduEventProducer</code> does not have any configuration 
parameters, but if it had any we would
-define them by prefixing it with <code>producer.</code> 
(<code>agent1.sinks.sink1.producer.parameter1</code> for
-example).</p>
-
-<p>The main producer logic resides in the <code>public List&lt;Operation&gt; 
getOperations()</code> method. In
-SimpleKuduEventProducerâs implementation we simply insert the binary body of 
the Flume event into
-the Kudu table. Here we call Kuduâs <code>newInsert()</code> to initiate an 
insert, but could have used
-<code>Upsert</code> if updating an existing row was also an option, in fact 
thereâs another producer
-implementation available for doing just that: 
<code>SimpleKeyedKuduEventProducer</code>. Most probably you
-will need to write your own custom producer in the real world, but you can 
base your implementation
-on the built-in ones.</p>
-
-<p>In the future, we plan to add more flexible event producer implementations 
so that creation of a
-custom event producer is not required to write data to Kudu. See
-<a href="https://gerrit.cloudera.org/#/c/4034/";>here</a> for a 
work-in-progress generic event producer for
-Avro-encoded Events.</p>
-
-<h2 id="conclusion">Conclusion</h2>
-
-<p>Kudu is a scalable data store which lets us ingest insane amounts of data 
per second. Apache Flume
-helps us aggregate data from various sources, and the Kudu Flume Sink lets us 
easily store
-the aggregated Flume events into Kudu. Together they enable us to create a 
data warehouse out of
-disparate sources.</p>
-
-<p><em>Ara Abrahamian is a software engineer at Argyle Data building fraud 
detection systems using
-sophisticated machine learning methods. Ara is the original author of the 
Flume Kudu Sink that
-is included in the Kudu distribution. You can follow him on Twitter at
-<a href="https://twitter.com/ara_e";>@ara_e</a>.</em></p>
-
-
-    
-  </div>
-  <div class="read-full">
-    <a class="btn btn-info" href="/2016/08/31/intro-flume-kudu-sink.html">Read 
full post...</a>
-  </div>
-</article>
-
-
-
 <!-- Pagination links -->
 
 <nav>
@@ -543,6 +250,8 @@ is included in the Kudu distribution. You can follow him on 
Twitter at
     <h3>Recent posts</h3>
     <ul>
     
+      <li> <a 
href="/2018/09/26/index-skip-scan-optimization-in-kudu.html">Index Skip Scan 
Optimization in Kudu</a> </li>
+    
       <li> <a 
href="/2018/09/11/simplified-pipelines-with-kudu.html">Simplified Data 
Pipelines with Kudu</a> </li>
     
       <li> <a 
href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">Getting 
Started with Kudu - an O'Reilly Title</a> </li>
@@ -571,8 +280,6 @@ is included in the Kudu distribution. You can follow him on 
Twitter at
     
       <li> <a href="/2016/11/01/weekly-update.html">Apache Kudu Weekly Update 
November 1st, 2016</a> </li>
     
-      <li> <a href="/2016/10/20/weekly-update.html">Apache Kudu Weekly Update 
October 20th, 2016</a> </li>
-    
     </ul>
   </div>
 </div>


http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/blog/page/5/index.html
----------------------------------------------------------------------
diff --git a/blog/page/5/index.html b/blog/page/5/index.html
index 1e4c02d..eba0ce9 100644
--- a/blog/page/5/index.html
+++ b/blog/page/5/index.html
@@ -117,6 +117,320 @@
 <!-- Articles -->
 <article>
   <header>
+    <h1 class="entry-title"><a 
href="/2016/08/31/intro-flume-kudu-sink.html">An Introduction to the Flume Kudu 
Sink</a></h1>
+    <p class="meta">Posted 31 Aug 2016 by Ara Abrahamian</p>
+  </header>
+  <div class="entry-content">
+    
+    <p>This post discusses the Kudu Flume Sink. First, Iâll give some 
background on why we considered
+using Kudu, what Flume does for us, and how Flume fits with Kudu in our 
project.</p>
+
+<h2 id="why-kudu">Why Kudu</h2>
+
+<p>Traditionally in the Hadoop ecosystem weâve dealt with various <em>batch 
processing</em> technologies such
+as MapReduce and the many libraries and tools built on top of it in various 
languages (Apache Pig,
+Apache Hive, Apache Oozie and many others). The main problem with this 
approach is that it needs to
+process the whole data set in batches, again and again, as soon as new data 
gets added. Things get
+really complicated when a few such tasks need to get chained together, or when 
the same data set
+needs to be processed in various ways by different jobs, while all compete for 
the shared cluster
+resources.</p>
+
+<p>The opposite of this approach is <em>stream processing</em>: process the 
data as soon as it arrives, not
+in batches. Streaming systems such as Spark Streaming, Storm, Kafka Streams, 
and many others make
+this possible. But writing streaming services is not trivial. The streaming 
systems are becoming
+more and more capable and support more complex constructs, but they are not 
yet easy to use. All
+queries and processes need to be carefully planned and implemented.</p>
+
+<p>To summarize, <em>batch processing</em> is:</p>
+
+<ul>
+  <li>file-based</li>
+  <li>a paradigm that processes large chunks of data as a group</li>
+  <li>high latency and high throughput, both for ingest and query</li>
+  <li>typically easy to program, but hard to orchestrate</li>
+  <li>well suited for writing ad-hoc queries, although they are typically high 
latency</li>
+</ul>
+
+<p>While <em>stream processing</em> is:</p>
+
+<ul>
+  <li>a totally different paradigm, which involves single events and time 
windows instead of large groups of events</li>
+  <li>still file-based and not a long-term database</li>
+  <li>not batch-oriented, but incremental</li>
+  <li>ultra-fast ingest and ultra-fast query (query results basically 
pre-calculated)</li>
+  <li>not so easy to program, relatively easy to orchestrate</li>
+  <li>impossible to write ad-hoc queries</li>
+</ul>
+
+<p>And a Kudu-based <em>near real-time</em> approach is:</p>
+
+<ul>
+  <li>flexible and expressive, thanks to SQL support via Apache Impala 
(incubating)</li>
+  <li>a table-oriented, mutable data store that feels like a traditional 
relational database</li>
+  <li>very easy to program, you can even pretend itâs good old MySQL</li>
+  <li>low-latency and relatively high throughput, both for ingest and 
query</li>
+</ul>
+
+<p>At Argyle Data, weâre dealing with complex fraud detection scenarios. We 
need to ingest massive
+amounts of data, run machine learning algorithms and generate reports. When we 
created our current
+architecture two years ago we decided to opt for a database as the backbone of 
our system. That
+database is Apache Accumulo. Itâs a key-value based database which runs on 
top of Hadoop HDFS,
+quite similar to HBase but with some important improvements such as cell level 
security and ease
+of deployment and management. To enable querying of this data for quite 
complex reporting and
+analytics, we used Presto, a distributed query engine with a pluggable 
architecture open-sourced
+by Facebook. We wrote a connector for it to let it run queries against the 
Accumulo database. This
+architecture has served us well, but there were a few problems:</p>
+
+<ul>
+  <li>we need to ingest even more massive volumes of data in real-time</li>
+  <li>we need to perform complex machine-learning calculations on even larger 
data-sets</li>
+  <li>we need to support ad-hoc queries, plus long-term data warehouse 
functionality</li>
+</ul>
+
+<p>So, weâve started gradually moving the core machine-learning pipeline to 
a streaming based
+solution. This way we can ingest and process larger data-sets faster in the 
real-time. But then how
+would we take care of ad-hoc queries and long-term persistence? This is where 
Kudu comes in. While
+the machine learning pipeline ingests and processes real-time data, we store a 
copy of the same
+ingested data in Kudu for long-term access and ad-hoc queries. Kudu is our 
<em>data warehouse</em>. By
+using Kudu and Impala, we can retire our in-house Presto connector and rely on 
Impalaâs
+super-fast query engine.</p>
+
+<p>But how would we make sure data is reliably ingested into the streaming 
pipeline <em>and</em> the
+Kudu-based data warehouse? This is where Apache Flume comes in.</p>
+
+<h2 id="why-flume">Why Flume</h2>
+
+<p>According to their <a href="http://flume.apache.org/";>website</a> âFlume 
is a distributed, reliable, and
+available service for efficiently collecting, aggregating, and moving large 
amounts of log data.
+It has a simple and flexible architecture based on streaming data flows. It is 
robust and fault
+tolerant with tunable reliability mechanisms and many failover and recovery 
mechanisms.â As you
+can see, nowhere is Hadoop mentioned but Flume is typically used for ingesting 
data to Hadoop
+clusters.</p>
+
+<p><img 
src="https://blogs.apache.org/flume/mediaresource/ab0d50f6-a960-42cc-971e-3da38ba3adad";
 alt="png" /></p>
+
+<p>Flume has an extensible architecture. An instance of Flume, called an 
<em>agent</em>, can have multiple
+<em>channels</em>, with each having multiple <em>sources</em> and 
<em>sinks</em> of various types. Sources queue data
+in channels, which in turn write out data to sinks. Such <em>pipelines</em> 
can be chained together to
+create even more complex ones. There may be more than one agent and agents can 
be configured to
+support failover and recovery.</p>
+
+<p>Flume comes with a bunch of built-in types of channels, sources and sinks. 
Memory channel is the
+default (an in-memory queue with no persistence to disk), but other options 
such as Kafka- and
+File-based channels are also provided. As for the sources, Avro, JMS, Thrift, 
spooling directory
+source are some of the built-in ones. Flume also ships with many sinks, 
including sinks for writing
+data to HDFS, HBase, Hive, Kafka, as well as to other Flume agents.</p>
+
+<p>In the rest of this post Iâll go over the Kudu Flume sink and show you 
how to configure Flume to
+write ingested data to a Kudu table. The sink has been part of the Kudu 
distribution since the 0.8
+release and the source code can be found <a 
href="https://github.com/apache/kudu/tree/master/java/kudu-flume-sink";>here</a>.</p>
+
+<h2 id="configuring-the-kudu-flume-sink">Configuring the Kudu Flume Sink</h2>
+
+<p>Here is a sample flume configuration file:</p>
+
+<div class="highlighter-rouge">agent1.sources  = source1
+agent1.channels = channel1
+agent1.sinks = sink1
+
+agent1.sources.source1.type = exec
+agent1.sources.source1.command = /usr/bin/vmstat 1
+agent1.sources.source1.channels = channel1
+
+agent1.channels.channel1.type = memory
+agent1.channels.channel1.capacity = 10000
+agent1.channels.channel1.transactionCapacity = 1000
+
+agent1.sinks.sink1.type = org.apache.flume.sink.kudu.KuduSink
+agent1.sinks.sink1.masterAddresses = localhost
+agent1.sinks.sink1.tableName = stats
+agent1.sinks.sink1.channel = channel1
+agent1.sinks.sink1.batchSize = 50
+agent1.sinks.sink1.producer = 
org.apache.kudu.flume.sink.SimpleKuduEventProducer
+</div>
+
+<p>We define a source called <code class="highlighter-rouge">source1</code> 
which simply executes a <code class="highlighter-rouge">vmstat</code> command 
to continuously generate
+virtual memory statistics for the machine and queue events into an in-memory 
<code class="highlighter-rouge">channel1</code> channel,
+which in turn is used for writing these events to a Kudu table called <code 
class="highlighter-rouge">stats</code>. We are using
+<code 
class="highlighter-rouge">org.apache.kudu.flume.sink.SimpleKuduEventProducer</code>
 as the producer. <code 
class="highlighter-rouge">SimpleKuduEventProducer</code> is
+the built-in and default producer, but itâs implemented as a showcase for 
how to write Flume
+events into Kudu tables. For any serious functionality weâd have to write a 
custom producer. We
+need to make this producer and the <code 
class="highlighter-rouge">KuduSink</code> class available to Flume. We can do 
that by simply
+copying the <code 
class="highlighter-rouge">kudu-flume-sink-&lt;VERSION&gt;.jar</code> jar file 
from the Kudu distribution to the
+<code class="highlighter-rouge">$FLUME_HOME/plugins.d/kudu-sink/lib</code> 
directory in the Flume installation. The jar file contains
+<code class="highlighter-rouge">KuduSink</code> and all of its dependencies 
(including Kudu java client classes).</p>
+
+<p>At a minimum, the Kudu Flume Sink needs to know where the Kudu masters are
+(<code class="highlighter-rouge">agent1.sinks.sink1.masterAddresses = 
localhost</code>) and which Kudu table should be used for writing
+Flume events to (<code class="highlighter-rouge">agent1.sinks.sink1.tableName 
= stats</code>). The Kudu Flume Sink doesnât create this
+table, it has to be created before the Kudu Flume Sink is started.</p>
+
+<p>You may also notice the <code class="highlighter-rouge">batchSize</code> 
parameter. Batch size is used for batching up to that many
+Flume events and flushing the entire batch in one shot. Tuning batchSize 
properly can have a huge
+impact on ingest performance of the Kudu cluster.</p>
+
+<p>Here is a complete list of KuduSink parameters:</p>
+
+<table>
+  <thead>
+    <tr>
+      <th>Parameter Name</th>
+      <th>Default</th>
+      <th>Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>masterAddresses</td>
+      <td>N/A</td>
+      <td>Comma-separated list of âhost:portâ pairs of the masters (port 
optional)</td>
+    </tr>
+    <tr>
+      <td>tableName</td>
+      <td>N/A</td>
+      <td>The name of the table in Kudu to write to</td>
+    </tr>
+    <tr>
+      <td>producer</td>
+      <td>org.apache.kudu.flume.sink.SimpleKuduEventProducer</td>
+      <td>The fully qualified class name of the Kudu event producer the sink 
should use</td>
+    </tr>
+    <tr>
+      <td>batchSize</td>
+      <td>100</td>
+      <td>Maximum number of events the sink should take from the channel per 
transaction, if available</td>
+    </tr>
+    <tr>
+      <td>timeoutMillis</td>
+      <td>30000</td>
+      <td>Timeout period for Kudu operations, in milliseconds</td>
+    </tr>
+    <tr>
+      <td>ignoreDuplicateRows</td>
+      <td>true</td>
+      <td>Whether to ignore errors indicating that we attempted to insert 
duplicate rows into Kudu</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>Letâs take a look at the source code for the built-in producer class:</p>
+
+<div class="highlighter-rouge"><span class="kd">public</span> <span 
class="kd">class</span> <span class="nc">SimpleKuduEventProducer</span> <span 
class="kd">implements</span> <span class="n">KuduEventProducer</span> <span 
class="o">{</span>
+  <span class="kd">private</span> <span class="kt">byte</span><span 
class="o">[]</span> <span class="n">payload</span><span class="o">;</span>
+  <span class="kd">private</span> <span class="n">KuduTable</span> <span 
class="n">table</span><span class="o">;</span>
+  <span class="kd">private</span> <span class="n">String</span> <span 
class="n">payloadColumn</span><span class="o">;</span>
+
+  <span class="kd">public</span> <span 
class="nf">SimpleKuduEventProducer</span><span class="o">(){</span>
+  <span class="o">}</span>
+
+  <span class="nd">@Override</span>
+  <span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">configure</span><span class="o">(</span><span 
class="n">Context</span> <span class="n">context</span><span class="o">)</span> 
<span class="o">{</span>
+    <span class="n">payloadColumn</span> <span class="o">=</span> <span 
class="n">context</span><span class="o">.</span><span 
class="na">getString</span><span class="o">(</span><span 
class="s">"payloadColumn"</span><span class="o">,</span><span 
class="s">"payload"</span><span class="o">);</span>
+  <span class="o">}</span>
+
+  <span class="nd">@Override</span>
+  <span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">configure</span><span class="o">(</span><span 
class="n">ComponentConfiguration</span> <span class="n">conf</span><span 
class="o">)</span> <span class="o">{</span>
+  <span class="o">}</span>
+
+  <span class="nd">@Override</span>
+  <span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">initialize</span><span class="o">(</span><span 
class="n">Event</span> <span class="n">event</span><span class="o">,</span> 
<span class="n">KuduTable</span> <span class="n">table</span><span 
class="o">)</span> <span class="o">{</span>
+    <span class="k">this</span><span class="o">.</span><span 
class="na">payload</span> <span class="o">=</span> <span 
class="n">event</span><span class="o">.</span><span 
class="na">getBody</span><span class="o">();</span>
+    <span class="k">this</span><span class="o">.</span><span 
class="na">table</span> <span class="o">=</span> <span 
class="n">table</span><span class="o">;</span>
+  <span class="o">}</span>
+
+  <span class="nd">@Override</span>
+  <span class="kd">public</span> <span class="n">List</span><span 
class="o">&lt;</span><span class="n">Operation</span><span 
class="o">&gt;</span> <span class="nf">getOperations</span><span 
class="o">()</span> <span class="kd">throws</span> <span 
class="n">FlumeException</span> <span class="o">{</span>
+    <span class="k">try</span> <span class="o">{</span>
+      <span class="n">Insert</span> <span class="n">insert</span> <span 
class="o">=</span> <span class="n">table</span><span class="o">.</span><span 
class="na">newInsert</span><span class="o">();</span>
+      <span class="n">PartialRow</span> <span class="n">row</span> <span 
class="o">=</span> <span class="n">insert</span><span class="o">.</span><span 
class="na">getRow</span><span class="o">();</span>
+      <span class="n">row</span><span class="o">.</span><span 
class="na">addBinary</span><span class="o">(</span><span 
class="n">payloadColumn</span><span class="o">,</span> <span 
class="n">payload</span><span class="o">);</span>
+
+      <span class="k">return</span> <span class="n">Collections</span><span 
class="o">.</span><span class="na">singletonList</span><span 
class="o">((</span><span class="n">Operation</span><span class="o">)</span> 
<span class="n">insert</span><span class="o">);</span>
+    <span class="o">}</span> <span class="k">catch</span> <span 
class="o">(</span><span class="n">Exception</span> <span 
class="n">e</span><span class="o">){</span>
+      <span class="k">throw</span> <span class="k">new</span> <span 
class="nf">FlumeException</span><span class="o">(</span><span class="s">"Failed 
to create Kudu Insert object!"</span><span class="o">,</span> <span 
class="n">e</span><span class="o">);</span>
+    <span class="o">}</span>
+  <span class="o">}</span>
+
+  <span class="nd">@Override</span>
+  <span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">close</span><span class="o">()</span> <span class="o">{</span>
+  <span class="o">}</span>
+<span class="o">}</span>
+</div>
+
+<p><code class="highlighter-rouge">SimpleKuduEventProducer</code> implements 
the <code 
class="highlighter-rouge">org.apache.kudu.flume.sink.KuduEventProducer</code> 
interface,
+which itself looks like this:</p>
+
+<div class="highlighter-rouge"><span class="kd">public</span> <span 
class="kd">interface</span> <span class="nc">KuduEventProducer</span> <span 
class="kd">extends</span> <span class="n">Configurable</span><span 
class="o">,</span> <span class="n">ConfigurableComponent</span> <span 
class="o">{</span>
+  <span class="cm">/**
+   * Initialize the event producer.
+   * @param event to be written to Kudu
+   * @param table the KuduTable object used for creating Kudu Operation objects
+   */</span>
+  <span class="kt">void</span> <span class="nf">initialize</span><span 
class="o">(</span><span class="n">Event</span> <span 
class="n">event</span><span class="o">,</span> <span class="n">KuduTable</span> 
<span class="n">table</span><span class="o">);</span>
+
+  <span class="cm">/**
+   * Get the operations that should be written out to Kudu as a result of this
+   * event. This list is written to Kudu using the Kudu client API.
+   * @return List of {@link org.kududb.client.Operation} which
+   * are written as such to Kudu
+   */</span>
+  <span class="n">List</span><span class="o">&lt;</span><span 
class="n">Operation</span><span class="o">&gt;</span> <span 
class="nf">getOperations</span><span class="o">();</span>
+
+  <span class="cm">/*
+   * Clean up any state. This will be called when the sink is being stopped.
+   */</span>
+  <span class="kt">void</span> <span class="nf">close</span><span 
class="o">();</span>
+<span class="o">}</span>
+</div>
+
+<p><code class="highlighter-rouge">public void configure(Context 
context)</code> is called when an instance of our producer is instantiated
+by the KuduSink. SimpleKuduEventProducerâs implementation looks for a 
producer parameter named
+<code class="highlighter-rouge">payloadColumn</code> and uses its value 
(âpayloadâ if not overridden in Flume configuration file) as the
+column which will hold the value of the Flume event payload. If you recall 
from above, we had
+configured the KuduSink to listen for events generated from the <code 
class="highlighter-rouge">vmstat</code> command. Each output row
+from that command will be stored as a new row containing a <code 
class="highlighter-rouge">payload</code> column in the <code 
class="highlighter-rouge">stats</code> table.
+<code class="highlighter-rouge">SimpleKuduEventProducer</code> does not have 
any configuration parameters, but if it had any we would
+define them by prefixing it with <code 
class="highlighter-rouge">producer.</code> (<code 
class="highlighter-rouge">agent1.sinks.sink1.producer.parameter1</code> for
+example).</p>
+
+<p>The main producer logic resides in the <code 
class="highlighter-rouge">public List&lt;Operation&gt; getOperations()</code> 
method. In
+SimpleKuduEventProducerâs implementation we simply insert the binary body of 
the Flume event into
+the Kudu table. Here we call Kuduâs <code 
class="highlighter-rouge">newInsert()</code> to initiate an insert, but could 
have used
+<code class="highlighter-rouge">Upsert</code> if updating an existing row was 
also an option, in fact thereâs another producer
+implementation available for doing just that: <code 
class="highlighter-rouge">SimpleKeyedKuduEventProducer</code>. Most probably you
+will need to write your own custom producer in the real world, but you can 
base your implementation
+on the built-in ones.</p>
+
+<p>In the future, we plan to add more flexible event producer implementations 
so that creation of a
+custom event producer is not required to write data to Kudu. See
+<a href="https://gerrit.cloudera.org/#/c/4034/";>here</a> for a 
work-in-progress generic event producer for
+Avro-encoded Events.</p>
+
+<h2 id="conclusion">Conclusion</h2>
+
+<p>Kudu is a scalable data store which lets us ingest insane amounts of data 
per second. Apache Flume
+helps us aggregate data from various sources, and the Kudu Flume Sink lets us 
easily store
+the aggregated Flume events into Kudu. Together they enable us to create a 
data warehouse out of
+disparate sources.</p>
+
+<p><em>Ara Abrahamian is a software engineer at Argyle Data building fraud 
detection systems using
+sophisticated machine learning methods. Ara is the original author of the 
Flume Kudu Sink that
+is included in the Kudu distribution. You can follow him on Twitter at
+<a href="https://twitter.com/ara_e";>@ara_e</a>.</em></p>
+
+
+    
+  </div>
+  <div class="read-full">
+    <a class="btn btn-info" href="/2016/08/31/intro-flume-kudu-sink.html">Read 
full post...</a>
+  </div>
+</article>
+
+
+
+<!-- Articles -->
+<article>
+  <header>
     <h1 class="entry-title"><a 
href="/2016/08/23/new-range-partitioning-features.html">New Range Partitioning 
Features in Kudu 0.10</a></h1>
     <p class="meta">Posted 23 Aug 2016 by Dan Burkert</p>
   </header>
@@ -201,27 +515,6 @@ covers ongoing development and news in the Apache Kudu 
project.</p>
 
 
 
-<!-- Articles -->
-<article>
-  <header>
-    <h1 class="entry-title"><a href="/2016/07/26/weekly-update.html">Apache 
Kudu Weekly Update July 26, 2016</a></h1>
-    <p class="meta">Posted 26 Jul 2016 by Jean-Daniel Cryans</p>
-  </header>
-  <div class="entry-content">
-    
-    <p>Welcome to the eighteenth edition of the Kudu Weekly Update. This 
weekly blog post
-covers ongoing development and news in the Apache Kudu project.</p>
-
-
-    
-  </div>
-  <div class="read-full">
-    <a class="btn btn-info" href="/2016/07/26/weekly-update.html">Read full 
post...</a>
-  </div>
-</article>
-
-
-
 <!-- Pagination links -->
 
 <nav>
@@ -242,6 +535,8 @@ covers ongoing development and news in the Apache Kudu 
project.</p>
     <h3>Recent posts</h3>
     <ul>
     
+      <li> <a 
href="/2018/09/26/index-skip-scan-optimization-in-kudu.html">Index Skip Scan 
Optimization in Kudu</a> </li>
+    
       <li> <a 
href="/2018/09/11/simplified-pipelines-with-kudu.html">Simplified Data 
Pipelines with Kudu</a> </li>
     
       <li> <a 
href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">Getting 
Started with Kudu - an O'Reilly Title</a> </li>
@@ -270,8 +565,6 @@ covers ongoing development and news in the Apache Kudu 
project.</p>
     
       <li> <a href="/2016/11/01/weekly-update.html">Apache Kudu Weekly Update 
November 1st, 2016</a> </li>
     
-      <li> <a href="/2016/10/20/weekly-update.html">Apache Kudu Weekly Update 
October 20th, 2016</a> </li>
-    
     </ul>
   </div>
 </div>

http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/blog/page/6/index.html
----------------------------------------------------------------------
diff --git a/blog/page/6/index.html b/blog/page/6/index.html
index 5801003..b2b5e52 100644
--- a/blog/page/6/index.html
+++ b/blog/page/6/index.html
@@ -117,6 +117,27 @@
 <!-- Articles -->
 <article>
   <header>
+    <h1 class="entry-title"><a href="/2016/07/26/weekly-update.html">Apache 
Kudu Weekly Update July 26, 2016</a></h1>
+    <p class="meta">Posted 26 Jul 2016 by Jean-Daniel Cryans</p>
+  </header>
+  <div class="entry-content">
+    
+    <p>Welcome to the eighteenth edition of the Kudu Weekly Update. This 
weekly blog post
+covers ongoing development and news in the Apache Kudu project.</p>
+
+
+    
+  </div>
+  <div class="read-full">
+    <a class="btn btn-info" href="/2016/07/26/weekly-update.html">Read full 
post...</a>
+  </div>
+</article>
+
+
+
+<!-- Articles -->
+<article>
+  <header>
     <h1 class="entry-title"><a href="/2016/07/25/asf-graduation.html">The 
Apache Software Foundation Announces Apache&reg; Kudu&trade; as a Top-Level 
Project</a></h1>
     <p class="meta">Posted 25 Jul 2016 by Jean-Daniel Cryans</p>
   </header>
@@ -209,27 +230,6 @@ of 0.9.0 are encouraged to update to the new version at 
their earliest convenien
 
 
 
-<!-- Articles -->
-<article>
-  <header>
-    <h1 class="entry-title"><a href="/2016/06/27/weekly-update.html">Apache 
Kudu (incubating) Weekly Update June 27, 2016</a></h1>
-    <p class="meta">Posted 27 Jun 2016 by Todd Lipcon</p>
-  </header>
-  <div class="entry-content">
-    
-    <p>Welcome to the fifteenth edition of the Kudu Weekly Update. This weekly 
blog post
-covers ongoing development and news in the Apache Kudu (incubating) 
project.</p>
-
-
-    
-  </div>
-  <div class="read-full">
-    <a class="btn btn-info" href="/2016/06/27/weekly-update.html">Read full 
post...</a>
-  </div>
-</article>
-
-
-
 <!-- Pagination links -->
 
 <nav>
@@ -250,6 +250,8 @@ covers ongoing development and news in the Apache Kudu 
(incubating) project.</p>
     <h3>Recent posts</h3>
     <ul>
     
+      <li> <a 
href="/2018/09/26/index-skip-scan-optimization-in-kudu.html">Index Skip Scan 
Optimization in Kudu</a> </li>
+    
       <li> <a 
href="/2018/09/11/simplified-pipelines-with-kudu.html">Simplified Data 
Pipelines with Kudu</a> </li>
     
       <li> <a 
href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">Getting 
Started with Kudu - an O'Reilly Title</a> </li>
@@ -278,8 +280,6 @@ covers ongoing development and news in the Apache Kudu 
(incubating) project.</p>
     
       <li> <a href="/2016/11/01/weekly-update.html">Apache Kudu Weekly Update 
November 1st, 2016</a> </li>
     
-      <li> <a href="/2016/10/20/weekly-update.html">Apache Kudu Weekly Update 
October 20th, 2016</a> </li>
-    
     </ul>
   </div>
 </div>

http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/blog/page/7/index.html
----------------------------------------------------------------------
diff --git a/blog/page/7/index.html b/blog/page/7/index.html
index d0dfe49..0692c27 100644
--- a/blog/page/7/index.html
+++ b/blog/page/7/index.html
@@ -117,6 +117,27 @@
 <!-- Articles -->
 <article>
   <header>
+    <h1 class="entry-title"><a href="/2016/06/27/weekly-update.html">Apache 
Kudu (incubating) Weekly Update June 27, 2016</a></h1>
+    <p class="meta">Posted 27 Jun 2016 by Todd Lipcon</p>
+  </header>
+  <div class="entry-content">
+    
+    <p>Welcome to the fifteenth edition of the Kudu Weekly Update. This weekly 
blog post
+covers ongoing development and news in the Apache Kudu (incubating) 
project.</p>
+
+
+    
+  </div>
+  <div class="read-full">
+    <a class="btn btn-info" href="/2016/06/27/weekly-update.html">Read full 
post...</a>
+  </div>
+</article>
+
+
+
+<!-- Articles -->
+<article>
+  <header>
     <h1 class="entry-title"><a 
href="/2016/06/24/multi-master-1-0-0.html">Master fault tolerance in Kudu 
1.0</a></h1>
     <p class="meta">Posted 24 Jun 2016 by Adar Dembo</p>
   </header>
@@ -202,37 +223,6 @@ covers ongoing development and news in the Apache Kudu 
(incubating) project.</p>
 
 
 
-<!-- Articles -->
-<article>
-  <header>
-    <h1 class="entry-title"><a 
href="/2016/06/10/apache-kudu-0-9-0-released.html">Apache Kudu (incubating) 
0.9.0 released</a></h1>
-    <p class="meta">Posted 10 Jun 2016 by Jean-Daniel Cryans</p>
-  </header>
-  <div class="entry-content">
-    
-    <p>The Apache Kudu (incubating) team is happy to announce the release of 
Kudu
-0.9.0!</p>
-
-<p>This latest version adds basic UPSERT functionality and an improved Apache 
Spark Data Source
-that doesnât rely on the MapReduce I/O formats. It also improves Tablet 
Server
-restart time as well as write performance under high load. Finally, Kudu now 
enforces
-the specification of a partitioning scheme for new tables.</p>
-
-<ul>
-  <li>Read the detailed <a 
href="http://kudu.apache.org/releases/0.9.0/docs/release_notes.html";>Kudu 0.9.0 
release notes</a></li>
-  <li>Download the <a href="http://kudu.apache.org/releases/0.9.0/";>Kudu 0.9.0 
source release</a></li>
-</ul>
-
-
-    
-  </div>
-  <div class="read-full">
-    <a class="btn btn-info" 
href="/2016/06/10/apache-kudu-0-9-0-released.html">Read full post...</a>
-  </div>
-</article>
-
-
-
 <!-- Pagination links -->
 
 <nav>
@@ -253,6 +243,8 @@ the specification of a partitioning scheme for new 
tables.</p>
     <h3>Recent posts</h3>
     <ul>
     
+      <li> <a 
href="/2018/09/26/index-skip-scan-optimization-in-kudu.html">Index Skip Scan 
Optimization in Kudu</a> </li>
+    
       <li> <a 
href="/2018/09/11/simplified-pipelines-with-kudu.html">Simplified Data 
Pipelines with Kudu</a> </li>
     
       <li> <a 
href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">Getting 
Started with Kudu - an O'Reilly Title</a> </li>
@@ -281,8 +273,6 @@ the specification of a partitioning scheme for new 
tables.</p>
     
       <li> <a href="/2016/11/01/weekly-update.html">Apache Kudu Weekly Update 
November 1st, 2016</a> </li>
     
-      <li> <a href="/2016/10/20/weekly-update.html">Apache Kudu Weekly Update 
October 20th, 2016</a> </li>
-    
     </ul>
   </div>
 </div>

http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/blog/page/8/index.html
----------------------------------------------------------------------
diff --git a/blog/page/8/index.html b/blog/page/8/index.html
index ce0a7e1..aa53f05 100644
--- a/blog/page/8/index.html
+++ b/blog/page/8/index.html
@@ -117,6 +117,37 @@
 <!-- Articles -->
 <article>
   <header>
+    <h1 class="entry-title"><a 
href="/2016/06/10/apache-kudu-0-9-0-released.html">Apache Kudu (incubating) 
0.9.0 released</a></h1>
+    <p class="meta">Posted 10 Jun 2016 by Jean-Daniel Cryans</p>
+  </header>
+  <div class="entry-content">
+    
+    <p>The Apache Kudu (incubating) team is happy to announce the release of 
Kudu
+0.9.0!</p>
+
+<p>This latest version adds basic UPSERT functionality and an improved Apache 
Spark Data Source
+that doesnât rely on the MapReduce I/O formats. It also improves Tablet 
Server
+restart time as well as write performance under high load. Finally, Kudu now 
enforces
+the specification of a partitioning scheme for new tables.</p>
+
+<ul>
+  <li>Read the detailed <a 
href="http://kudu.apache.org/releases/0.9.0/docs/release_notes.html";>Kudu 0.9.0 
release notes</a></li>
+  <li>Download the <a href="http://kudu.apache.org/releases/0.9.0/";>Kudu 0.9.0 
source release</a></li>
+</ul>
+
+
+    
+  </div>
+  <div class="read-full">
+    <a class="btn btn-info" 
href="/2016/06/10/apache-kudu-0-9-0-released.html">Read full post...</a>
+  </div>
+</article>
+
+
+
+<!-- Articles -->
+<article>
+  <header>
     <h1 class="entry-title"><a href="/2016/06/06/weekly-update.html">Apache 
Kudu (incubating) Weekly Update June 6, 2016</a></h1>
     <p class="meta">Posted 06 Jun 2016 by Jean-Daniel Cryans</p>
   </header>
@@ -200,27 +231,6 @@ covers ongoing development and news in the Apache Kudu 
(incubating) project.</p>
 
 
 
-<!-- Articles -->
-<article>
-  <header>
-    <h1 class="entry-title"><a href="/2016/05/16/weekly-update.html">Apache 
Kudu (incubating) Weekly Update May 16, 2016</a></h1>
-    <p class="meta">Posted 16 May 2016 by Todd Lipcon</p>
-  </header>
-  <div class="entry-content">
-    
-    <p>Welcome to the ninth edition of the Kudu Weekly Update. This weekly 
blog post
-covers ongoing development and news in the Apache Kudu (incubating) 
project.</p>
-
-
-    
-  </div>
-  <div class="read-full">
-    <a class="btn btn-info" href="/2016/05/16/weekly-update.html">Read full 
post...</a>
-  </div>
-</article>
-
-
-
 <!-- Pagination links -->
 
 <nav>
@@ -241,6 +251,8 @@ covers ongoing development and news in the Apache Kudu 
(incubating) project.</p>
     <h3>Recent posts</h3>
     <ul>
     
+      <li> <a 
href="/2018/09/26/index-skip-scan-optimization-in-kudu.html">Index Skip Scan 
Optimization in Kudu</a> </li>
+    
       <li> <a 
href="/2018/09/11/simplified-pipelines-with-kudu.html">Simplified Data 
Pipelines with Kudu</a> </li>
     
       <li> <a 
href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">Getting 
Started with Kudu - an O'Reilly Title</a> </li>
@@ -269,8 +281,6 @@ covers ongoing development and news in the Apache Kudu 
(incubating) project.</p>
     
       <li> <a href="/2016/11/01/weekly-update.html">Apache Kudu Weekly Update 
November 1st, 2016</a> </li>
     
-      <li> <a href="/2016/10/20/weekly-update.html">Apache Kudu Weekly Update 
October 20th, 2016</a> </li>
-    
     </ul>
   </div>
 </div>

http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/blog/page/9/index.html
----------------------------------------------------------------------
diff --git a/blog/page/9/index.html b/blog/page/9/index.html
index ce14d37..85c617d 100644
--- a/blog/page/9/index.html
+++ b/blog/page/9/index.html
@@ -117,6 +117,27 @@
 <!-- Articles -->
 <article>
   <header>
+    <h1 class="entry-title"><a href="/2016/05/16/weekly-update.html">Apache 
Kudu (incubating) Weekly Update May 16, 2016</a></h1>
+    <p class="meta">Posted 16 May 2016 by Todd Lipcon</p>
+  </header>
+  <div class="entry-content">
+    
+    <p>Welcome to the ninth edition of the Kudu Weekly Update. This weekly 
blog post
+covers ongoing development and news in the Apache Kudu (incubating) 
project.</p>
+
+
+    
+  </div>
+  <div class="read-full">
+    <a class="btn btn-info" href="/2016/05/16/weekly-update.html">Read full 
post...</a>
+  </div>
+</article>
+
+
+
+<!-- Articles -->
+<article>
+  <header>
     <h1 class="entry-title"><a href="/2016/05/09/weekly-update.html">Apache 
Kudu (incubating) Weekly Update May 9, 2016</a></h1>
     <p class="meta">Posted 09 May 2016 by Jean-Daniel Cryans</p>
   </header>
@@ -197,29 +218,6 @@ covers ongoing development and news in the Apache Kudu 
(incubating) project.</p>
 
 
 
-<!-- Articles -->
-<article>
-  <header>
-    <h1 class="entry-title"><a 
href="/2016/04/19/kudu-0-8-0-predicate-improvements.html">Predicate 
Improvements in Kudu 0.8</a></h1>
-    <p class="meta">Posted 19 Apr 2016 by Dan Burkert</p>
-  </header>
-  <div class="entry-content">
-    
-    <p>The recently released Kudu version 0.8 ships with a host of new 
improvements to
-scan predicates. Performance and usability have been improved, especially for
-tables taking advantage of <a 
href="http://kudu.apache.org/docs/schema_design.html#data-distribution";>advanced
 partitioning
-options</a>.</p>
-
-
-    
-  </div>
-  <div class="read-full">
-    <a class="btn btn-info" 
href="/2016/04/19/kudu-0-8-0-predicate-improvements.html">Read full post...</a>
-  </div>
-</article>
-
-
-
 <!-- Pagination links -->
 
 <nav>
@@ -240,6 +238,8 @@ options</a>.</p>
     <h3>Recent posts</h3>
     <ul>
     
+      <li> <a 
href="/2018/09/26/index-skip-scan-optimization-in-kudu.html">Index Skip Scan 
Optimization in Kudu</a> </li>
+    
       <li> <a 
href="/2018/09/11/simplified-pipelines-with-kudu.html">Simplified Data 
Pipelines with Kudu</a> </li>
     
       <li> <a 
href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">Getting 
Started with Kudu - an O'Reilly Title</a> </li>
@@ -268,8 +268,6 @@ options</a>.</p>
     
       <li> <a href="/2016/11/01/weekly-update.html">Apache Kudu Weekly Update 
November 1st, 2016</a> </li>
     
-      <li> <a href="/2016/10/20/weekly-update.html">Apache Kudu Weekly Update 
October 20th, 2016</a> </li>
-    
     </ul>
   </div>
 </div>

http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/faq.html
----------------------------------------------------------------------
diff --git a/faq.html b/faq.html
index 5cd722f..65885fa 100644
--- a/faq.html
+++ b/faq.html
@@ -345,8 +345,8 @@ enforcing âexternal consistencyâ in two different ways: 
one that optimizes f
 requires the user to perform additional work and another that requires no 
additional
 work but can result in some additional latency.</li>
   <li>Scans have âRead Committedâ consistency by default. If the user 
requires strict-serializable
-scans it can choose the <code>READ_AT_SNAPSHOT</code> mode and, optionally, 
provide a timestamp. The default
-option is non-blocking but the <code>READ_AT_SNAPSHOT</code> option may block 
when reading from non-leader
+scans it can choose the <code 
class="highlighter-rouge">READ_AT_SNAPSHOT</code> mode and, optionally, provide 
a timestamp. The default
+option is non-blocking but the <code 
class="highlighter-rouge">READ_AT_SNAPSHOT</code> option may block when reading 
from non-leader
 replicas.</li>
 </ul>
 
@@ -369,7 +369,7 @@ further information and caveats.</p>
 
 <p>Kudu provides direct access via Java and C++ APIs. An experimental Python 
API is
 also available and is expected to be fully supported in the future. The easiest
-way to load data into Kudu is to use a <code>CREATE TABLE ... AS SELECT * FROM 
...</code>
+way to load data into Kudu is to use a <code class="highlighter-rouge">CREATE 
TABLE ... AS SELECT * FROM ...</code>
 statement in Impala. Although Kudu has not been extensively tested to work with
 ingest tools such as Flume, Sqoop, or Kafka, several of these have been
 experimentally tested. Explicit support for these ingest tools is expected with
@@ -378,7 +378,7 @@ Kuduâs first generally available release.</p>
 <h4 id="whats-the-most-efficient-way-to-bulk-load-data-into-kudu">Whatâs the 
most efficient way to bulk load data into Kudu?</h4>
 
 <p>The easiest way to load data into Kudu is if the data is already managed by 
Impala.
-In this case, a simple <code>INSERT INTO TABLE some_kudu_table SELECT * FROM 
some_csv_table</code>
+In this case, a simple <code class="highlighter-rouge">INSERT INTO TABLE 
some_kudu_table SELECT * FROM some_csv_table</code>
 does the trick.</p>
 
 <p>You can also use Kuduâs MapReduce OutputFormat to load data from HDFS, 
HBase, or
@@ -530,8 +530,8 @@ features.</p>
 Impala can help if you have it available. You can use it to copy your data into
 Parquet format using a statement like:</p>
 
-<pre><code>INSERT INTO TABLE some_parquet_table SELECT * FROM kudu_table
-</code></pre>
+<div class="highlighter-rouge">INSERT INTO TABLE some_parquet_table SELECT * 
FROM kudu_table
+</div>
 
 <p>then use <a 
href="http://hadoop.apache.org/docs/r1.2.1/distcp2.html";>distcp</a>
 to copy the Parquet data to another cluster.</p>

http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/feed.xml
----------------------------------------------------------------------
diff --git a/feed.xml b/feed.xml
index 218dfab..49afcef 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,107 @@
-<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom";><generator uri="http://jekyllrb.com"; 
version="2.5.3">Jekyll</generator><link href="/feed.xml" rel="self" 
type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" 
/><updated>2018-09-11T17:54:59+02:00</updated><id>/</id><entry><title>Simplified
 Data Pipelines with Kudu</title><link 
href="/2018/09/11/simplified-pipelines-with-kudu.html" rel="alternate" 
type="text/html" title="Simplified Data Pipelines with Kudu" 
/><published>2018-09-11T00:00:00+02:00</published><updated>2018-09-11T00:00:00+02:00</updated><id>/2018/09/11/simplified-pipelines-with-kudu</id><content
 type="html" 
xml:base="/2018/09/11/simplified-pipelines-with-kudu.html">&lt;p&gt;Iâve been 
working with Hadoop now for over seven years and fortunately, or unfortunately, 
have run
+<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom";><generator uri="http://jekyllrb.com"; 
version="2.5.3">Jekyll</generator><link href="/feed.xml" rel="self" 
type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" 
/><updated>2018-09-26T10:55:43-07:00</updated><id>/</id><entry><title>Index 
Skip Scan Optimization in Kudu</title><link 
href="/2018/09/26/index-skip-scan-optimization-in-kudu.html" rel="alternate" 
type="text/html" title="Index Skip Scan Optimization in Kudu" 
/><published>2018-09-26T00:00:00-07:00</published><updated>2018-09-26T00:00:00-07:00</updated><id>/2018/09/26/index-skip-scan-optimization-in-kudu</id><content
 type="html" 
xml:base="/2018/09/26/index-skip-scan-optimization-in-kudu.html">&lt;p&gt;This 
summer I got the opportunity to intern with the Apache Kudu team at Cloudera.
+My project was to optimize the Kudu scan path by implementing a technique 
called
+index skip scan (a.k.a. scan-to-seek, see section 4.1 in [1]). I wanted to 
share
+my experience and the progress weâve made so far on the approach.&lt;/p&gt;
+
+&lt;!--more--&gt;
+
+&lt;p&gt;Letâs begin with discussing the current query flow in Kudu.
+Consider the following table:&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code 
class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span 
class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span 
class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;metrics&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;
+    &lt;span class=&quot;k&quot;&gt;host&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt;
+    &lt;span class=&quot;n&quot;&gt;tstamp&lt;/span&gt; &lt;span 
class=&quot;nb&quot;&gt;INT&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt;
+    &lt;span class=&quot;n&quot;&gt;clusterid&lt;/span&gt; &lt;span 
class=&quot;nb&quot;&gt;INT&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt;
+    &lt;span class=&quot;k&quot;&gt;role&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt;
+    &lt;span class=&quot;k&quot;&gt;PRIMARY&lt;/span&gt; &lt;span 
class=&quot;k&quot;&gt;KEY&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;k&quot;&gt;host&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;tstamp&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;clusterid&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;span 
class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;&lt;img src=&quot;/img/index-skip-scan/example-table.png&quot; 
alt=&quot;png&quot; class=&quot;img-responsive&quot; /&gt;
+&lt;em&gt;Sample rows of table &lt;code 
class=&quot;highlighter-rouge&quot;&gt;metrics&lt;/code&gt; (sorted by key 
columns).&lt;/em&gt;&lt;/p&gt;
+
+&lt;p&gt;In this case, by default, Kudu internally builds a primary key index 
(implemented as a
+&lt;a 
href=&quot;https://en.wikipedia.org/wiki/B-tree&quot;&gt;B-tree&lt;/a&gt;) for 
the table &lt;code class=&quot;highlighter-rouge&quot;&gt;metrics&lt;/code&gt;.
+As shown in the table above, the index data is sorted by the composite of all 
key columns.
+When the user query contains the first key column (&lt;code 
class=&quot;highlighter-rouge&quot;&gt;host&lt;/code&gt;), Kudu uses the index 
(as the index data is
+primarily sorted on the first key column).&lt;/p&gt;
+
+&lt;p&gt;Now, what if the user query does not contain the first key column and 
instead only contains the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;tstamp&lt;/code&gt; column?
+In the above case, the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;tstamp&lt;/code&gt; column values are 
sorted with respect to &lt;code 
class=&quot;highlighter-rouge&quot;&gt;host&lt;/code&gt;,
+but are not globally sorted, and as such, itâs non-trivial to use the index 
to filter rows.
+Instead, a full tablet scan is done by default. Other databases may optimize 
such scans by building secondary indexes
+(though it might be redundant to build one on one of the primary keys). 
However, this isnât an option for Kudu,
+given its lack of secondary index support.&lt;/p&gt;
+
+&lt;p&gt;The question is, can Kudu do better than a full tablet scan 
here?&lt;/p&gt;
+
+&lt;p&gt;The answer is yes! Letâs observe the column preceding the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;tstamp&lt;/code&gt; column. We will 
refer to it as the
+âprefix columnâ and its specific value as the âprefix keyâ. In this 
example, &lt;code class=&quot;highlighter-rouge&quot;&gt;host&lt;/code&gt; is 
the prefix column.
+Note that the prefix keys are sorted in the index and that all rows of a given 
prefix key are also sorted by the
+remaining key columns. Therefore, we can use the index to skip to the rows 
that have distinct prefix keys,
+and also satisfy the predicate on the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;tstamp&lt;/code&gt; column.
+For example, consider the query:&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code 
class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span 
class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;clusterid&lt;/span&gt; &lt;span 
class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;metrics&lt;/span&gt; &lt;span 
class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;tstamp&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/img/index-skip-scan/skip-scan-example-table.png&quot; 
alt=&quot;png&quot; class=&quot;img-responsive&quot; /&gt;
+&lt;em&gt;Skip scan flow illustration. The rows in green are scanned and the 
rest are skipped.&lt;/em&gt;&lt;/p&gt;
+
+&lt;p&gt;The tablet server can use the index to 
&lt;strong&gt;skip&lt;/strong&gt; to the first row with a distinct prefix key 
(&lt;code class=&quot;highlighter-rouge&quot;&gt;host = helium&lt;/code&gt;) 
that
+matches the predicate (&lt;code class=&quot;highlighter-rouge&quot;&gt;tstamp 
= 100&lt;/code&gt;) and then &lt;strong&gt;scan&lt;/strong&gt; through the rows 
until the predicate no longer matches. At that
+point we would know that no more rows with &lt;code 
class=&quot;highlighter-rouge&quot;&gt;host = helium&lt;/code&gt; will satisfy 
the predicate, and we can skip to the next
+prefix key. This holds true for all distinct keys of &lt;code 
class=&quot;highlighter-rouge&quot;&gt;host&lt;/code&gt;. Hence, this method is 
popularly known as
+&lt;strong&gt;skip scan optimization&lt;/strong&gt;[2, 3].&lt;/p&gt;
+
+&lt;h1 id=&quot;performance&quot;&gt;Performance&lt;/h1&gt;
+
+&lt;p&gt;This optimization can speed up queries significantly, depending on 
the cardinality (number of distinct values) of the
+prefix column. The lower the prefix column cardinality, the better the skip 
scan performance. In fact, when the
+prefix column cardinality is high, skip scan is not a viable approach. The 
performance graph (obtained using the example
+schema and query pattern mentioned earlier) is shown below.&lt;/p&gt;
+
+&lt;p&gt;Based on our experiments, on up to 10 million rows per tablet (as 
shown below), we found that the skip scan performance
+begins to get worse with respect to the full tablet scan performance when the 
prefix column cardinality
+exceeds sqrt(number_of_rows_in_tablet).
+Therefore, in order to use skip scan performance benefits when possible and 
maintain a consistent performance in cases
+of large prefix column cardinality, we have tentatively chosen to dynamically 
disable skip scan when the number of skips for
+distinct prefix keys exceeds sqrt(number_of_rows_in_tablet).
+It will be an interesting project to further explore sophisticated heuristics 
to decide when
+to dynamically disable skip scan.&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/img/index-skip-scan/skip-scan-performance-graph.png&quot; 
alt=&quot;png&quot; class=&quot;img-responsive&quot; /&gt;&lt;/p&gt;
+
+&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;
+
+&lt;p&gt;Skip scan optimization in Kudu can lead to huge performance benefits 
that scale with the size of
+data in Kudu tablets. This is a work-in-progress &lt;a 
href=&quot;https://gerrit.cloudera.org/#/c/10983/&quot;&gt;patch&lt;/a&gt;.
+The implementation in the patch works only for equality predicates on the 
non-first primary key
+columns. An important point to note is that although, in the above specific 
example, the number of prefix
+columns is one (&lt;code 
class=&quot;highlighter-rouge&quot;&gt;host&lt;/code&gt;), this approach is 
generalized to work with any number of prefix columns.&lt;/p&gt;
+
+&lt;p&gt;This work also lays the groundwork to leverage the skip scan approach 
and optimize query processing time in the
+following use cases:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Range predicates&lt;/li&gt;
+  &lt;li&gt;In-list predicates&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;This was my first time working on an open source project. I 
thoroughly enjoyed working on this challenging problem,
+right from understanding the scan path in Kudu to working on a full-fledged 
implementation of
+the skip scan optimization. I am very grateful to the Kudu team for guiding 
and supporting me throughout the
+internship period.&lt;/p&gt;
+
+&lt;h1 id=&quot;references&quot;&gt;References&lt;/h1&gt;
+
+&lt;p&gt;&lt;a 
href=&quot;https://storage.googleapis.com/pub-tools-public-publication-data/pdf/42851.pdf&quot;&gt;[1]&lt;/a&gt;:
 Gupta, Ashish, et al. âMesa:
+Geo-replicated, near real-time, scalable data warehousing.â Proceedings of 
the VLDB Endowment 7.12 (2014): 1259-1270.&lt;/p&gt;
+
+&lt;p&gt;&lt;a 
href=&quot;https://oracle-base.com/articles/9i/index-skip-scanning/&quot;&gt;[2]&lt;/a&gt;:
 Index Skip Scanning - Oracle Database&lt;/p&gt;
+
+&lt;p&gt;&lt;a 
href=&quot;https://www.sqlite.org/optoverview.html#skipscan&quot;&gt;[3]&lt;/a&gt;:
 Skip Scan - SQLite&lt;/p&gt;</content><author><name>Anupama 
Gupta</name></author><summary>This summer I got the opportunity to intern with 
the Apache Kudu team at Cloudera.
+My project was to optimize the Kudu scan path by implementing a technique 
called
+index skip scan (a.k.a. scan-to-seek, see section 4.1 in [1]). I wanted to 
share
+my experience and the progress weâve made so far on the 
approach.</summary></entry><entry><title>Simplified Data Pipelines with 
Kudu</title><link href="/2018/09/11/simplified-pipelines-with-kudu.html" 
rel="alternate" type="text/html" title="Simplified Data Pipelines with Kudu" 
/><published>2018-09-11T00:00:00-07:00</published><updated>2018-09-11T00:00:00-07:00</updated><id>/2018/09/11/simplified-pipelines-with-kudu</id><content
 type="html" 
xml:base="/2018/09/11/simplified-pipelines-with-kudu.html">&lt;p&gt;Iâve been 
working with Hadoop now for over seven years and fortunately, or unfortunately, 
have run
 across a lot of structured data use cases.  What we, at &lt;a 
href=&quot;https://phdata.io/&quot;&gt;phData&lt;/a&gt;, have found is
 that end users are typically comfortable with tabular data and prefer to 
access their data in a
 structured manner using tables.
@@ -38,7 +141,7 @@ and users to focus on solving business problems, rather than 
being bothered by t
 the backend.&lt;/p&gt;</content><author><name>Mac 
Noland</name></author><summary>Iâve been working with Hadoop now for over 
seven years and fortunately, or unfortunately, have run
 across a lot of structured data use cases.  What we, at phData, have found is
 that end users are typically comfortable with tabular data and prefer to 
access their data in a
-structured manner using tables.</summary></entry><entry><title>Getting Started 
with Kudu - an OâReilly Title</title><link 
href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html" 
rel="alternate" type="text/html" title="Getting Started with Kudu - an 
O&#39;Reilly Title" 
/><published>2018-08-06T00:00:00+02:00</published><updated>2018-08-06T00:00:00+02:00</updated><id>/2018/08/06/getting-started-with-kudu-an-oreilly-title</id><content
 type="html" 
xml:base="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">&lt;p&gt;The
 following article by Brock Noland was reposted from the
+structured manner using tables.</summary></entry><entry><title>Getting Started 
with Kudu - an OâReilly Title</title><link 
href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html" 
rel="alternate" type="text/html" title="Getting Started with Kudu - an 
O&#39;Reilly Title" 
/><published>2018-08-06T00:00:00-07:00</published><updated>2018-08-06T00:00:00-07:00</updated><id>/2018/08/06/getting-started-with-kudu-an-oreilly-title</id><content
 type="html" 
xml:base="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">&lt;p&gt;The
 following article by Brock Noland was reposted from the
 &lt;a 
href=&quot;https://www.phdata.io/getting-started-with-kudu/&quot;&gt;phData&lt;/a&gt;
 blog with their permission.&lt;/p&gt;
 
@@ -52,9 +155,9 @@ challenge at that time.
 In that context, on October 11th 2012 Todd Lipcon perform Apache Kuduâs 
initial
 commit. The commit message was:&lt;/p&gt;
 
-&lt;pre&gt;&lt;code&gt;Code for writing cfiles seems to basically work
+&lt;div class=&quot;highlighter-rouge&quot;&gt;Code for writing cfiles seems 
to basically work
 Need to write code for reading cfiles, still
-&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
 
 &lt;p&gt;And Kudu development was off and running. Around this same time Todd, 
on his
 internal Wiki page, started listing out the papers he was reading to develop
@@ -90,7 +193,7 @@ of Kudu. Specifically you will learn:&lt;/p&gt;
 
 &lt;p&gt;Looking forward, I am excited to see Kudu gain additional features 
and adoption
 and eventually the second revision of this title. In the meantime, if you have
-feedback or questions, please reach out on the 
&lt;code&gt;#getting-started-kudu&lt;/code&gt; channel of
+feedback or questions, please reach out on the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;#getting-started-kudu&lt;/code&gt; 
channel of
 the &lt;a href=&quot;https://getkudu-slack.herokuapp.com/&quot;&gt;Kudu 
Slack&lt;/a&gt; or if you prefer non-real-time
 communication, please use the user@ mailing 
list!&lt;/p&gt;</content><author><name>Brock Noland</name></author><summary>The 
following article by Brock Noland was reposted from the
 phData
@@ -101,7 +204,7 @@ Hadoop platform was hard. Organizations required strong 
Software Engineering
 capabilities to successfully implement complex Lambda architectures or even
 simply implement continuous ingest. Updating or deleting data, were simply a
 nightmare. General Data Protection Regulation (GDPR) would have been an extreme
-challenge at that time.</summary></entry><entry><title>Instrumentation in 
Apache Kudu</title><link href="/2018/07/10/instrumentation-in-kudu.html" 
rel="alternate" type="text/html" title="Instrumentation in Apache Kudu" 
/><published>2018-07-10T00:00:00+02:00</published><updated>2018-07-10T00:00:00+02:00</updated><id>/2018/07/10/instrumentation-in-kudu</id><content
 type="html" xml:base="/2018/07/10/instrumentation-in-kudu.html">&lt;p&gt;Last 
week, the &lt;a 
href=&quot;http://opentracing.io/&quot;&gt;OpenTracing&lt;/a&gt; community 
invited me to
+challenge at that time.</summary></entry><entry><title>Instrumentation in 
Apache Kudu</title><link href="/2018/07/10/instrumentation-in-kudu.html" 
rel="alternate" type="text/html" title="Instrumentation in Apache Kudu" 
/><published>2018-07-10T00:00:00-07:00</published><updated>2018-07-10T00:00:00-07:00</updated><id>/2018/07/10/instrumentation-in-kudu</id><content
 type="html" xml:base="/2018/07/10/instrumentation-in-kudu.html">&lt;p&gt;Last 
week, the &lt;a 
href=&quot;http://opentracing.io/&quot;&gt;OpenTracing&lt;/a&gt; community 
invited me to
 their monthly Google Hangout meetup to give an informal talk on tracing and
 instrumentation in Apache Kudu.&lt;/p&gt;
 
@@ -136,7 +239,7 @@ While Kudu doesnât currently support distributed tracing 
using OpenTracing,
 it does have quite a lot of other types of instrumentation, metrics, and
 diagnostics logging. The OpenTracing team was interested to hear about some of
 the approaches that Kudu has used, and so I gave a brief introduction to topics
-including:</summary></entry><entry><title>Apache Kudu 1.7.0 
released</title><link href="/2018/03/23/apache-kudu-1-7-0-released.html" 
rel="alternate" type="text/html" title="Apache Kudu 1.7.0 released" 
/><published>2018-03-23T00:00:00+01:00</published><updated>2018-03-23T00:00:00+01:00</updated><id>/2018/03/23/apache-kudu-1-7-0-released</id><content
 type="html" 
xml:base="/2018/03/23/apache-kudu-1-7-0-released.html">&lt;p&gt;The Apache Kudu 
team is happy to announce the release of Kudu 1.7.0!&lt;/p&gt;
+including:</summary></entry><entry><title>Apache Kudu 1.7.0 
released</title><link href="/2018/03/23/apache-kudu-1-7-0-released.html" 
rel="alternate" type="text/html" title="Apache Kudu 1.7.0 released" 
/><published>2018-03-23T00:00:00-07:00</published><updated>2018-03-23T00:00:00-07:00</updated><id>/2018/03/23/apache-kudu-1-7-0-released</id><content
 type="html" 
xml:base="/2018/03/23/apache-kudu-1-7-0-released.html">&lt;p&gt;The Apache Kudu 
team is happy to announce the release of Kudu 1.7.0!&lt;/p&gt;
 
 &lt;p&gt;Apache Kudu 1.7.0 is a minor release that offers new features, 
performance
 optimizations, incremental improvements, and bug fixes.&lt;/p&gt;
@@ -207,7 +310,7 @@ Maven repository and are
 Apache Kudu 1.7.0 is a minor release that offers new features, performance
 optimizations, incremental improvements, and bug fixes.
 
-Release highlights:</summary></entry><entry><title>Apache Kudu 1.6.0 
released</title><link href="/2017/12/08/apache-kudu-1-6-0-released.html" 
rel="alternate" type="text/html" title="Apache Kudu 1.6.0 released" 
/><published>2017-12-08T00:00:00+01:00</published><updated>2017-12-08T00:00:00+01:00</updated><id>/2017/12/08/apache-kudu-1-6-0-released</id><content
 type="html" 
xml:base="/2017/12/08/apache-kudu-1-6-0-released.html">&lt;p&gt;The Apache Kudu 
team is happy to announce the release of Kudu 1.6.0!&lt;/p&gt;
+Release highlights:</summary></entry><entry><title>Apache Kudu 1.6.0 
released</title><link href="/2017/12/08/apache-kudu-1-6-0-released.html" 
rel="alternate" type="text/html" title="Apache Kudu 1.6.0 released" 
/><published>2017-12-08T00:00:00-08:00</published><updated>2017-12-08T00:00:00-08:00</updated><id>/2017/12/08/apache-kudu-1-6-0-released</id><content
 type="html" 
xml:base="/2017/12/08/apache-kudu-1-6-0-released.html">&lt;p&gt;The Apache Kudu 
team is happy to announce the release of Kudu 1.6.0!&lt;/p&gt;
 
 &lt;p&gt;Apache Kudu 1.6.0 is a minor release that offers new features, 
performance
 optimizations, incremental improvements, and bug fixes.&lt;/p&gt;
@@ -266,7 +369,7 @@ Maven repository and are
 Apache Kudu 1.6.0 is a minor release that offers new features, performance
 optimizations, incremental improvements, and bug fixes.
 
-Release highlights:</summary></entry><entry><title>Slides: A brave new world 
in mutable big data: Relational storage</title><link 
href="/2017/10/23/nosql-kudu-spanner-slides.html" rel="alternate" 
type="text/html" title="Slides: A brave new world in mutable big data: 
Relational storage" 
/><published>2017-10-23T00:00:00+02:00</published><updated>2017-10-23T00:00:00+02:00</updated><id>/2017/10/23/nosql-kudu-spanner-slides</id><content
 type="html" 
xml:base="/2017/10/23/nosql-kudu-spanner-slides.html">&lt;p&gt;Since the Apache 
Kudu project made its debut in 2015, there have been
+Release highlights:</summary></entry><entry><title>Slides: A brave new world 
in mutable big data: Relational storage</title><link 
href="/2017/10/23/nosql-kudu-spanner-slides.html" rel="alternate" 
type="text/html" title="Slides: A brave new world in mutable big data: 
Relational storage" 
/><published>2017-10-23T00:00:00-07:00</published><updated>2017-10-23T00:00:00-07:00</updated><id>/2017/10/23/nosql-kudu-spanner-slides</id><content
 type="html" 
xml:base="/2017/10/23/nosql-kudu-spanner-slides.html">&lt;p&gt;Since the Apache 
Kudu project made its debut in 2015, there have been
 a few common questions that kept coming up at every presentation:&lt;/p&gt;
 
 &lt;ul&gt;
@@ -326,7 +429,7 @@ a few common questions that kept coming up at every 
presentation:
 
   Is Kudu an open source version of Googleâs Spanner system?
   Is Kudu NoSQL or SQL?
-  Why does Kudu have a relational data model? Isnât SQL 
dead?</summary></entry><entry><title>Consistency in Apache Kudu, Part 
1</title><link href="/2017/09/18/kudu-consistency-pt1.html" rel="alternate" 
type="text/html" title="Consistency in Apache Kudu, Part 1" 
/><published>2017-09-18T00:00:00+02:00</published><updated>2017-09-18T00:00:00+02:00</updated><id>/2017/09/18/kudu-consistency-pt1</id><content
 type="html" xml:base="/2017/09/18/kudu-consistency-pt1.html">&lt;p&gt;In this 
series of short blog posts we will introduce Kuduâs consistency model,
+  Why does Kudu have a relational data model? Isnât SQL 
dead?</summary></entry><entry><title>Consistency in Apache Kudu, Part 
1</title><link href="/2017/09/18/kudu-consistency-pt1.html" rel="alternate" 
type="text/html" title="Consistency in Apache Kudu, Part 1" 
/><published>2017-09-18T00:00:00-07:00</published><updated>2017-09-18T00:00:00-07:00</updated><id>/2017/09/18/kudu-consistency-pt1</id><content
 type="html" xml:base="/2017/09/18/kudu-consistency-pt1.html">&lt;p&gt;In this 
series of short blog posts we will introduce Kuduâs consistency model,
 its design and ultimate goals, current features, and next steps.
 On the way, weâll shed some light on the more relevant components and how 
they
 fit together.&lt;/p&gt;
@@ -445,29 +548,29 @@ have increasing timestamps, depending on the userâs 
choices.&lt;/p&gt;
 &lt;p&gt;Row mutations performed by a single client 
&lt;em&gt;instance&lt;/em&gt; are guaranteed to have increasing timestamps
 thus reflecting their potential causal relationship. This property is always 
enforced. However
 there are two major &lt;em&gt;âknobsâ&lt;/em&gt; that are available to the 
user to make performance trade-offs, the
-&lt;code&gt;Read&lt;/code&gt; mode, and the &lt;code&gt;External 
Consistency&lt;/code&gt; mode (see &lt;a 
href=&quot;https://kudu.apache.org/docs/transaction_semantics.html&quot;&gt;here&lt;/a&gt;
+&lt;code class=&quot;highlighter-rouge&quot;&gt;Read&lt;/code&gt; mode, and 
the &lt;code class=&quot;highlighter-rouge&quot;&gt;External 
Consistency&lt;/code&gt; mode (see &lt;a 
href=&quot;https://kudu.apache.org/docs/transaction_semantics.html&quot;&gt;here&lt;/a&gt;
 for more information on how to use the relevant APIs).&lt;/p&gt;
 
-&lt;p&gt;The first and most important knob, the &lt;code&gt;Read&lt;/code&gt; 
mode, pertains to what is the guaranteed recency of
+&lt;p&gt;The first and most important knob, the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;Read&lt;/code&gt; mode, pertains to what 
is the guaranteed recency of
 data resulting from scans. Since Kudu uses replication for availability and 
fault-tolerance, there
 are always multiple replicas of any data item.
 Not all replicas must be up-to-date so if the user cares about recency, e.g. 
if the user requires
 that any data read includes all previously written data &lt;em&gt;from a 
single client instance&lt;/em&gt; then it must
-choose the &lt;code&gt;READ_AT_SNAPSHOT&lt;/code&gt; read mode. With this mode 
enabled the client is guaranteed to observe
+choose the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;READ_AT_SNAPSHOT&lt;/code&gt; read mode. 
With this mode enabled the client is guaranteed to observe
  &lt;strong&gt;âREAD YOUR OWN WRITESâ&lt;/strong&gt; semantics, i.e. scans 
from a client will always include all previous mutations
 performed by that client. Note that this property is local to a single client 
instance, not a global
 property.&lt;/p&gt;
 
-&lt;p&gt;The second âknobâ, the &lt;code&gt;External 
Consistency&lt;/code&gt; mode, defines the semantics of how reads and writes
-are performed across multiple client instances. By default, 
&lt;code&gt;External Consistency&lt;/code&gt; is set to
- &lt;code&gt;CLIENT_PROPAGATED&lt;/code&gt;, meaning itâs up to the user to 
coordinate a set of &lt;em&gt;timestamp tokens&lt;/em&gt; with clients (even
+&lt;p&gt;The second âknobâ, the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;External Consistency&lt;/code&gt; mode, 
defines the semantics of how reads and writes
+are performed across multiple client instances. By default, &lt;code 
class=&quot;highlighter-rouge&quot;&gt;External Consistency&lt;/code&gt; is set 
to
+ &lt;code 
class=&quot;highlighter-rouge&quot;&gt;CLIENT_PROPAGATED&lt;/code&gt;, meaning 
itâs up to the user to coordinate a set of &lt;em&gt;timestamp 
tokens&lt;/em&gt; with clients (even
 across different machines) if they are performing writes/reads that are 
somehow causally linked.
 If done correctly this enables &lt;strong&gt;STRICT 
SERIALIZABILITY&lt;/strong&gt;[5], i.e. 
&lt;strong&gt;LINEARIZABILITY&lt;/strong&gt;[6] and
 &lt;strong&gt;SERIALIZABILITY&lt;/strong&gt;[7] at the same time, at the cost 
of having the user coordinate the timestamp
 tokens across clients (a survey of the meaning of these, and other definitions 
can be found
 &lt;a 
href=&quot;http://www.ics.forth.gr/tech-reports/2013/2013.TR439_Survey_on_Consistency_Conditions.pdf&quot;&gt;here&lt;/a&gt;).
-The alternative setting for &lt;code&gt;External Consistency&lt;/code&gt; is 
to have it set to
-&lt;code&gt;COMMIT_WAIT&lt;/code&gt; (experimental), which guarantees the same 
properties through a different means, by
+The alternative setting for &lt;code 
class=&quot;highlighter-rouge&quot;&gt;External Consistency&lt;/code&gt; is to 
have it set to
+&lt;code class=&quot;highlighter-rouge&quot;&gt;COMMIT_WAIT&lt;/code&gt; 
(experimental), which guarantees the same properties through a different means, 
by
 implementing Google Spannerâs &lt;em&gt;TrueTime&lt;/em&gt;. This comes at 
the cost of higher latency (depending on how
 tightly synchronized the system clocks of the various tablet servers are), but 
doesnât require users
 to propagate timestamps programmatically.&lt;/p&gt;
@@ -505,7 +608,7 @@ On the way, weâll shed some light on the more relevant 
components and how they
 fit together.
 
 In Part 1 of the series (this one), weâll cover motivation and design 
trade-offs, the end goals and
-the current status.</summary></entry><entry><title>Apache Kudu 1.5.0 
released</title><link href="/2017/09/08/apache-kudu-1-5-0-released.html" 
rel="alternate" type="text/html" title="Apache Kudu 1.5.0 released" 
/><published>2017-09-08T00:00:00+02:00</published><updated>2017-09-08T00:00:00+02:00</updated><id>/2017/09/08/apache-kudu-1-5-0-released</id><content
 type="html" 
xml:base="/2017/09/08/apache-kudu-1-5-0-released.html">&lt;p&gt;The Apache Kudu 
team is happy to announce the release of Kudu 1.5.0!&lt;/p&gt;
+the current status.</summary></entry><entry><title>Apache Kudu 1.5.0 
released</title><link href="/2017/09/08/apache-kudu-1-5-0-released.html" 
rel="alternate" type="text/html" title="Apache Kudu 1.5.0 released" 
/><published>2017-09-08T00:00:00-07:00</published><updated>2017-09-08T00:00:00-07:00</updated><id>/2017/09/08/apache-kudu-1-5-0-released</id><content
 type="html" 
xml:base="/2017/09/08/apache-kudu-1-5-0-released.html">&lt;p&gt;The Apache Kudu 
team is happy to announce the release of Kudu 1.5.0!&lt;/p&gt;
 
 &lt;p&gt;Apache Kudu 1.5.0 is a minor release which offers several new 
features,
 improvements, optimizations, and bug fixes.&lt;/p&gt;
@@ -523,9 +626,9 @@ scenarios&lt;/li&gt;
 additional reductions planned for the future&lt;/li&gt;
   &lt;li&gt;a new configuration dashboard on the web UI which provides a 
high-level
 summary of important configuration values&lt;/li&gt;
-  &lt;li&gt;a new &lt;code&gt;kudu tablet move&lt;/code&gt; command which 
moves a tablet replica from one tablet
+  &lt;li&gt;a new &lt;code class=&quot;highlighter-rouge&quot;&gt;kudu tablet 
move&lt;/code&gt; command which moves a tablet replica from one tablet
 server to another&lt;/li&gt;
-  &lt;li&gt;a new &lt;code&gt;kudu local_replica data_size&lt;/code&gt; 
command which summarizes the space usage
+  &lt;li&gt;a new &lt;code class=&quot;highlighter-rouge&quot;&gt;kudu 
local_replica data_size&lt;/code&gt; command which summarizes the space usage
 of a local tablet&lt;/li&gt;
   &lt;li&gt;all on-disk data is now checksummed by default, which provides 
error detection
 for improved confidence when running Kudu on unreliable hardware&lt;/li&gt;
@@ -546,7 +649,7 @@ repository.&lt;/li&gt;
 Apache Kudu 1.5.0 is a minor release which offers several new features,
 improvements, optimizations, and bug fixes.
 
-Highlights include:</summary></entry><entry><title>Apache Kudu 1.4.0 
released</title><link href="/2017/06/13/apache-kudu-1-4-0-released.html" 
rel="alternate" type="text/html" title="Apache Kudu 1.4.0 released" 
/><published>2017-06-13T00:00:00+02:00</published><updated>2017-06-13T00:00:00+02:00</updated><id>/2017/06/13/apache-kudu-1-4-0-released</id><content
 type="html" 
xml:base="/2017/06/13/apache-kudu-1-4-0-released.html">&lt;p&gt;The Apache Kudu 
team is happy to announce the release of Kudu 1.4.0!&lt;/p&gt;
+Highlights include:</summary></entry><entry><title>Apache Kudu 1.4.0 
released</title><link href="/2017/06/13/apache-kudu-1-4-0-released.html" 
rel="alternate" type="text/html" title="Apache Kudu 1.4.0 released" 
/><published>2017-06-13T00:00:00-07:00</published><updated>2017-06-13T00:00:00-07:00</updated><id>/2017/06/13/apache-kudu-1-4-0-released</id><content
 type="html" 
xml:base="/2017/06/13/apache-kudu-1-4-0-released.html">&lt;p&gt;The Apache Kudu 
team is happy to announce the release of Kudu 1.4.0!&lt;/p&gt;
 
 &lt;p&gt;Apache Kudu 1.4.0 is a minor release which offers several new 
features,
 improvements, optimizations, and bug fixes.&lt;/p&gt;
@@ -560,7 +663,7 @@ improvements, optimizations, and bug fixes.&lt;/p&gt;
   &lt;li&gt;a new C++ client API to efficiently map primary keys to their 
associated partitions
 and hosts&lt;/li&gt;
   &lt;li&gt;support for long-running fault-tolerant scans in the Java 
client&lt;/li&gt;
-  &lt;li&gt;a new &lt;code&gt;kudu fs check&lt;/code&gt; command which can 
perform offline consistency checks
+  &lt;li&gt;a new &lt;code class=&quot;highlighter-rouge&quot;&gt;kudu fs 
check&lt;/code&gt; command which can perform offline consistency checks
 and repairs on the local on-disk storage of a Tablet Server or 
Master.&lt;/li&gt;
   &lt;li&gt;many optimizations to reduce disk space usage, improve write 
throughput,
 and improve throughput of background maintenance operations.&lt;/li&gt;
@@ -581,31 +684,4 @@ repository.&lt;/li&gt;
 Apache Kudu 1.4.0 is a minor release which offers several new features,
 improvements, optimizations, and bug fixes.
 
-Highlights include:</summary></entry><entry><title>Apache Kudu 1.3.1 
released</title><link href="/2017/04/19/apache-kudu-1-3-1-released.html" 
rel="alternate" type="text/html" title="Apache Kudu 1.3.1 released" 
/><published>2017-04-19T00:00:00+02:00</published><updated>2017-04-19T00:00:00+02:00</updated><id>/2017/04/19/apache-kudu-1-3-1-released</id><content
 type="html" 
xml:base="/2017/04/19/apache-kudu-1-3-1-released.html">&lt;p&gt;The Apache Kudu 
team is happy to announce the release of Kudu 1.3.1!&lt;/p&gt;
-
-&lt;p&gt;Apache Kudu 1.3.1 is a bug fix release which fixes critical issues 
discovered
-in Apache Kudu 1.3.0. In particular, this fixes a bug in which data could be
-incorrectly deleted after certain sequences of node failures. Several other
-bugs are also fixed. See the release notes for details.&lt;/p&gt;
-
-&lt;p&gt;Users of Kudu 1.3.0 are encouraged to upgrade to 1.3.1 
immediately.&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Download the &lt;a href=&quot;/releases/1.3.1/&quot;&gt;Kudu 1.3.1 
source release&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Convenience binary artifacts for the Java client and various Java
-integrations (eg Spark, Flume) are also now available via the ASF Maven
-repository.&lt;/li&gt;
-&lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The 
Apache Kudu team is happy to announce the release of Kudu 1.3.1!
-
-Apache Kudu 1.3.1 is a bug fix release which fixes critical issues discovered
-in Apache Kudu 1.3.0. In particular, this fixes a bug in which data could be
-incorrectly deleted after certain sequences of node failures. Several other
-bugs are also fixed. See the release notes for details.
-
-Users of Kudu 1.3.0 are encouraged to upgrade to 1.3.1 immediately.
-
-
-  Download the Kudu 1.3.1 source release
-  Convenience binary artifacts for the Java client and various Java
-integrations (eg Spark, Flume) are also now available via the ASF Maven
-repository.</summary></entry></feed>
+Highlights include:</summary></entry></feed>

http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/img/index-skip-scan/example-table.png
----------------------------------------------------------------------
diff --git a/img/index-skip-scan/example-table.png 
b/img/index-skip-scan/example-table.png
new file mode 100644
index 0000000..585ae4d
Binary files /dev/null and b/img/index-skip-scan/example-table.png differ

[2/4] kudu-site git commit: Publish commit(s) from site source repo: 83530755d Blogpost describing index skip scan optimization.

Reply via email to