Repository: kafka-site
Updated Branches:
  refs/heads/asf-site 2f085ed55 -> b294ebcac


Add 0.10.2 docs from 0.10.2.0 RC2


Project: http://git-wip-us.apache.org/repos/asf/kafka-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka-site/commit/b294ebca
Tree: http://git-wip-us.apache.org/repos/asf/kafka-site/tree/b294ebca
Diff: http://git-wip-us.apache.org/repos/asf/kafka-site/diff/b294ebca

Branch: refs/heads/asf-site
Commit: b294ebcacfad0f7a452f172f016a0c7ee0dbbebd
Parents: 2f085ed
Author: Ewen Cheslack-Postava <[email protected]>
Authored: Tue Feb 14 10:34:48 2017 -0800
Committer: Ewen Cheslack-Postava <[email protected]>
Committed: Tue Feb 14 10:34:48 2017 -0800

----------------------------------------------------------------------
 0102/generated/topic_config.html |  4 ++--
 0102/quickstart.html             |  5 +----
 0102/streams.html                | 29 +++++++++++++++++++++--------
 3 files changed, 24 insertions(+), 14 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b294ebca/0102/generated/topic_config.html
----------------------------------------------------------------------
diff --git a/0102/generated/topic_config.html b/0102/generated/topic_config.html
index 87eb7bd..8974147 100644
--- a/0102/generated/topic_config.html
+++ b/0102/generated/topic_config.html
@@ -21,11 +21,11 @@
 <tr>
 <td>flush.ms</td><td>This setting allows specifying a time interval at which 
we will force an fsync of data written to the log. For example if this was set 
to 1000 we would fsync after 1000 ms had passed. In general we recommend you 
not set this and use replication for durability and allow the operating 
system's background flush capabilities as it is more 
efficient.</td><td>long</td><td>9223372036854775807</td><td>[0,...]</td><td>log.flush.interval.ms</td><td>medium</td></tr>
 <tr>
-<td>follower.replication.throttled.replicas</td><td>A list of replicas for 
which log replication should be throttled on the follower side. The list should 
describe a set of replicas in the form 
[PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively the 
wildcard '*' can be used to throttle all replicas for this 
topic.</td><td>list</td><td>""</td><td>kafka.server.ThrottledReplicaListValidator$@6c503cb2</td><td>follower.replication.throttled.replicas</td><td>medium</td></tr>
+<td>follower.replication.throttled.replicas</td><td>A list of replicas for 
which log replication should be throttled on the follower side. The list should 
describe a set of replicas in the form 
[PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively the 
wildcard '*' can be used to throttle all replicas for this 
topic.</td><td>list</td><td>""</td><td>kafka.server.ThrottledReplicaListValidator$@7be8c2a2</td><td>follower.replication.throttled.replicas</td><td>medium</td></tr>
 <tr>
 <td>index.interval.bytes</td><td>This setting controls how frequently Kafka 
adds an index entry to it's offset index. The default setting ensures that we 
index a message roughly every 4096 bytes. More indexing allows reads to jump 
closer to the exact position in the log but makes the index larger. You 
probably don't need to change 
this.</td><td>int</td><td>4096</td><td>[0,...]</td><td>log.index.interval.bytes</td><td>medium</td></tr>
 <tr>
-<td>leader.replication.throttled.replicas</td><td>A list of replicas for which 
log replication should be throttled on the leader side. The list should 
describe a set of replicas in the form 
[PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively the 
wildcard '*' can be used to throttle all replicas for this 
topic.</td><td>list</td><td>""</td><td>kafka.server.ThrottledReplicaListValidator$@6c503cb2</td><td>leader.replication.throttled.replicas</td><td>medium</td></tr>
+<td>leader.replication.throttled.replicas</td><td>A list of replicas for which 
log replication should be throttled on the leader side. The list should 
describe a set of replicas in the form 
[PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively the 
wildcard '*' can be used to throttle all replicas for this 
topic.</td><td>list</td><td>""</td><td>kafka.server.ThrottledReplicaListValidator$@7be8c2a2</td><td>leader.replication.throttled.replicas</td><td>medium</td></tr>
 <tr>
 <td>max.message.bytes</td><td>This is largest message size Kafka will allow to 
be appended. Note that if you increase this size you must also increase your 
consumer's fetch size so they can fetch messages this 
large.</td><td>int</td><td>1000012</td><td>[0,...]</td><td>message.max.bytes</td><td>medium</td></tr>
 <tr>

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b294ebca/0102/quickstart.html
----------------------------------------------------------------------
diff --git a/0102/quickstart.html b/0102/quickstart.html
index bfc9af3..69f6a7a 100644
--- a/0102/quickstart.html
+++ b/0102/quickstart.html
@@ -359,7 +359,7 @@ stream data will likely be flowing continuously into Kafka 
where the application
 
 
 <pre>
-&gt; <b>cat file-input.txt | ./bin/kafka-console-producer --broker-list 
localhost:9092 --topic streams-file-input</b>
+&gt; <b>bin/kafka-console-producer.sh --broker-list localhost:9092 --topic 
streams-file-input < file-input.txt</b>
 </pre>
 
 <p>
@@ -397,12 +397,9 @@ with the following output data being printed to the 
console:
 
 <pre>
 all     1
-streams 1
 lead    1
 to      1
-kafka   1
 hello   1
-kafka   2
 streams 2
 join    1
 kafka   3

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b294ebca/0102/streams.html
----------------------------------------------------------------------
diff --git a/0102/streams.html b/0102/streams.html
index 19af2b3..94ce7a9 100644
--- a/0102/streams.html
+++ b/0102/streams.html
@@ -497,25 +497,31 @@
 
         <p>
         The same mechanism is used, for example, to replicate databases via 
change data capture (CDC) and, within Kafka Streams, to replicate its so-called 
state stores across machines for fault-tolerance.
-        The stream-table duality is such an important concept that Kafka 
Streams models it explicitly via the <a href="#streams_kstream_ktable">KStream 
and KTable</a> interfaces, which we describe in the next sections.
+        The stream-table duality is such an important concept that Kafka 
Streams models it explicitly via the <a href="#streams_kstream_ktable">KStream, 
KTable, and GlobalKTable</a> interfaces, which we describe in the next sections.
         </p>
 
-        <h5><a id="streams_kstream_ktable" 
href="#streams_kstream_ktable">KStream and KTable</a></h5>
-        The DSL uses two main abstractions. A <b>KStream</b> is an abstraction 
of a record stream, where each data record represents a self-contained datum in 
the unbounded data set.
+        <h5><a id="streams_kstream_ktable" 
href="#streams_kstream_ktable">KStream, KTable, and GlobalKTable</a></h5>
+        The DSL uses three main abstractions. A <b>KStream</b> is an 
abstraction of a record stream, where each data record represents a 
self-contained datum in the unbounded data set.
         A <b>KTable</b> is an abstraction of a changelog stream, where each 
data record represents an update. More precisely, the value in a data record is 
considered to be an update of the last value for the same record key,
-        if any (if a corresponding key doesn't exist yet, the update will be 
considered a create). To illustrate the difference between KStreams and 
KTables, let's imagine the following two data records are being sent to the 
stream:
+        if any (if a corresponding key doesn't exist yet, the update will be 
considered a create).
+        Like a <b>KTable</b>, a <b>GlobalKTable</b> is an abstraction of a 
changelog stream, where each data record represents an update.
+        However, a <b>GlobalKTable</b> is different from a <b>KTable</b> in 
that it is fully replicated on each KafkaStreams instance.
+        <b>GlobalKTable</b> also provides the ability to look up current 
values of data records by keys.
+        This table-lookup functionality is available through <a 
href="#streams_dsl_joins">join operations</a>.
+
+        To illustrate the difference between KStreams and 
KTables/GlobalKTables, let’s imagine the following two data records are being 
sent to the stream:
 
         <pre>
             ("alice", 1) --> ("alice", 3)
         </pre>
 
-        If these records a KStream and the stream processing application were 
to sum the values it would return <code>4</code>. If these records were a 
KTable, the return would be <code>3</code>, since the last record would be 
considered as an update.
+        If these records a KStream and the stream processing application were 
to sum the values it would return <code>4</code>. If these records were a 
KTable or GlobalKTable, the return would be <code>3</code>, since the last 
record would be considered as an update.
 
         <h4><a id="streams_dsl_source" href="#streams_dsl_source">Create 
Source Streams from Kafka</a></h4>
 
         <p>
-        Either a <b>record stream</b> (defined as <code>KStream</code>) or a 
<b>changelog stream</b> (defined as <code>KTable</code>)
-        can be created as a source stream from one or more Kafka topics (for 
<code>KTable</code> you can only create the source stream
+        Either a <b>record stream</b> (defined as <code>KStream</code>) or a 
<b>changelog stream</b> (defined as <code>KTable</code> or 
<code>GlobalKTable</code>)
+        can be created as a source stream from one or more Kafka topics (for 
<code>KTable</code> and <code>GlobalKTable</code> you can only create the 
source stream
         from a single topic).
         </p>
 
@@ -524,6 +530,7 @@
 
             KStream&lt;String, GenericRecord&gt; source1 = 
builder.stream("topic1", "topic2");
             KTable&lt;String, GenericRecord&gt; source2 = 
builder.table("topic3", "stateStoreName");
+            GlobalKTable&lt;String, GenericRecord&gt; source2 = 
builder.globalTable("topic4", "globalStoreName");
         </pre>
 
         <h4><a id="streams_dsl_windowing" 
href="#streams_dsl_windowing">Windowing a stream</a></h4>
@@ -551,7 +558,13 @@
         <li><b>KStream-to-KStream Joins</b> are always windowed joins, since 
otherwise the memory and state required to compute the join would grow 
infinitely in size. Here, a newly received record from one of the streams is 
joined with the other stream's records within the specified window interval to 
produce one result for each matching pair based on user-provided 
<code>ValueJoiner</code>. A new <code>KStream</code> instance representing the 
result stream of the join is returned from this operator.</li>
         
         <li><b>KTable-to-KTable Joins</b> are join operations designed to be 
consistent with the ones in relational databases. Here, both changelog streams 
are materialized into local state stores first. When a new record is received 
from one of the streams, it is joined with the other stream's materialized 
state stores to produce one result for each matching pair based on 
user-provided ValueJoiner. A new <code>KTable</code> instance representing the 
result stream of the join, which is also a changelog stream of the represented 
table, is returned from this operator.</li>
-        <li><b>KStream-to-KTable Joins</b> allow you to perform table lookups 
against a changelog stream (<code>KTable</code>) upon receiving a new record 
from another record stream (KStream). An example use case would be to enrich a 
stream of user activities (<code>KStream</code>) with the latest user profile 
information (<code>KTable</code>). Only records received from the record stream 
will trigger the join and produce results via <code>ValueJoiner</code>, not 
vice versa (i.e., records received from the changelog stream will be used only 
to update the materialized state store). A new <code>KStream</code> instance 
representing the result stream of the join is returned from this operator.</li>
+        <li><b>KStream-to-KTable Joins</b> allow you to perform table lookups 
against a changelog stream (<code>KTable</code>) upon receiving a new record 
from another record stream (<code>KStream</code>). An example use case would be 
to enrich a stream of user activities (<code>KStream</code>) with the latest 
user profile information (<code>KTable</code>). Only records received from the 
record stream will trigger the join and produce results via 
<code>ValueJoiner</code>, not vice versa (i.e., records received from the 
changelog stream will be used only to update the materialized state store). A 
new <code>KStream</code> instance representing the result stream of the join is 
returned from this operator.</li>
+        <li><b>KStream-to-GlobalKTable Joins</b> allow you to perform table 
lookups against a fully replicated changelog stream (<code>GlobalKTable</code>) 
upon receiving a new record from another record stream (<code>KStream</code>).
+            Joins with a <code>GlobalKTable</code> don't require 
repartitioning of the input <code>KStream</code> as all partitions of the 
<code>GlobalKTable</code> are available on every KafkaStreams instance.
+            The <code>KeyValueMapper</code> provided with the join operation 
is applied to each KStream record to extract the join-key that is used to do 
the lookup to the GlobalKTable so non-record-key joins are possible.
+            An example use case would be to enrich a stream of user activities 
(<code>KStream</code>) with the latest user profile information 
(<code>GlobalKTable</code>).
+            Only records received from the record stream will trigger the join 
and produce results via <code>ValueJoiner</code>, not vice versa (i.e., records 
received from the changelog stream will be used only to update the materialized 
state store).
+            A new <code>KStream</code> instance representing the result stream 
of the join is returned from this operator.</li>
         </ul>
 
         Depending on the operands the following join operations are supported: 
<b>inner joins</b>, <b>outer joins</b> and <b>left joins</b>. Their semantics 
are similar to the corresponding operators in relational databases.

Reply via email to