This is an automated email from the ASF dual-hosted git repository. guozhang pushed a commit to branch 2.0 in repository https://gitbox.apache.org/repos/asf/kafka.git
The following commit(s) were added to refs/heads/2.0 by this push: new 1d239a0 HOTFIX: Fix broken links (#5676) 1d239a0 is described below commit 1d239a0fd87dcfdfffd030d293c00f6faff57ba9 Author: Bill Bejeck <bbej...@gmail.com> AuthorDate: Wed Oct 3 21:27:15 2018 -0400 HOTFIX: Fix broken links (#5676) Reviewers: Joel Hamill <11722533+joel-ham...@users.noreply.github.com>, Guozhang Wang <wangg...@gmail.com> --- docs/streams/core-concepts.html | 48 ++++++++ docs/streams/developer-guide/dsl-api.html | 147 ++++++++++++++++++++---- docs/streams/developer-guide/processor-api.html | 4 +- docs/streams/upgrade-guide.html | 2 +- 4 files changed, 178 insertions(+), 23 deletions(-) diff --git a/docs/streams/core-concepts.html b/docs/streams/core-concepts.html index 3f9eab5..b6d7762 100644 --- a/docs/streams/core-concepts.html +++ b/docs/streams/core-concepts.html @@ -131,6 +131,54 @@ Note, that the describe default behavior can be changed in the Processor API by assigning timestamps to output records explicitly when calling <code>#forward()</code>. </p> + <h3><a id="streams_concepts_aggregations" href="#streams_concepts_aggregations">Aggregations</a></h3> + <p> + An <strong>aggregation</strong> operation takes one input stream or table, and yields a new table by combining multiple input records into a single output record. Examples of aggregations are computing counts or sum. + </p> + + <p> + In the <code>Kafka Streams DSL</code>, an input stream of an <code>aggregation</code> can be a KStream or a KTable, but the output stream will always be a KTable. This allows Kafka Streams to update an aggregate value upon the late arrival of further records after the value was produced and emitted. When such late arrival happens, the aggregating KStream or KTable emits a new aggregate value. Because the output is a KTable, the new value is considered to overwrite the old value w [...] + </p> + + <h3> <a id="streams_concepts_windowing" href="#streams_concepts_windowing">Windowing</a></h3> + <p> + Windowing lets you control how to <em>group records that have the same key</em> for stateful operations such as <code>aggregations</code> or <code>joins</code> into so-called <em>windows</em>. Windows are tracked per record key. + </p> + <p> + <code>Windowing operations</code> are available in the <code>Kafka Streams DSL</code>. When working with windows, you can specify a <strong>retention period</strong> for the window. This retention period controls how long Kafka Streams will wait for <strong>out-of-order</strong> or <strong>late-arriving</strong> data records for a given window. If a record arrives after the retention period of a window has passed, the record is discarded and will not be processed in that window. + </p> + <p> + Late-arriving records are always possible in the real world and should be properly accounted for in your applications. It depends on the effective <code>time semantics </code> how late records are handled. In the case of processing-time, the semantics are "when the record is being processed", which means that the notion of late records is not applicable as, by definition, no record can be late. Hence, late-arriving records can only be considered as such (i.e. as arrivin [...] + </p> + + <h3><a id="streams_concepts_duality" href="#streams-concepts-duality">Duality of Streams and Tables</a></h3> + <p> + When implementing stream processing use cases in practice, you typically need both <strong>streams</strong> and also <strong>databases</strong>. + An example use case that is very common in practice is an e-commerce application that enriches an incoming <em>stream</em> of customer + transactions with the latest customer information from a <em>database table</em>. In other words, streams are everywhere, but databases are everywhere, too. + </p> + + <p> + Any stream processing technology must therefore provide <strong>first-class support for streams and tables</strong>. + Kafka's Streams API provides such functionality through its core abstractions for + <code class="interpreted-text" data-role="ref">streams <streams_concepts_kstream></code> and + <code class="interpreted-text" data-role="ref">tables <streams_concepts_ktable></code>, which we will talk about in a minute. + Now, an interesting observation is that there is actually a <strong>close relationship between streams and tables</strong>, + the so-called stream-table duality. + And Kafka exploits this duality in many ways: for example, to make your applications + <code class="interpreted-text" data-role="ref">elastic <streams_developer-guide_execution-scaling></code>, + to support <code class="interpreted-text" data-role="ref">fault-tolerant stateful processing <streams_developer-guide_state-store_fault-tolerance></code>, + or to run <code class="interpreted-text" data-role="ref">interactive queries <streams_concepts_interactive-queries></code> + against your application's latest processing results. And, beyond its internal usage, the Kafka Streams API + also allows developers to exploit this duality in their own applications. + </p> + + <p> + Before we discuss concepts such as <code class="interpreted-text" data-role="ref">aggregations <streams_concepts_aggregations></code> + in Kafka Streams we must first introduce <strong>tables</strong> in more detail, and talk about the aforementioned stream-table duality. + Essentially, this duality means that a stream can be viewed as a table, and a table can be viewed as a stream. + </p> + <h3><a id="streams_state" href="#streams_state">States</a></h3> <p> diff --git a/docs/streams/developer-guide/dsl-api.html b/docs/streams/developer-guide/dsl-api.html index beb83a3..aa44dea 100644 --- a/docs/streams/developer-guide/dsl-api.html +++ b/docs/streams/developer-guide/dsl-api.html @@ -79,9 +79,9 @@ <h2><a class="toc-backref" href="#id7">Overview</a><a class="headerlink" href="#overview" title="Permalink to this headline"></a></h2> <p>In comparison to the <a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a>, only the DSL supports:</p> <ul class="simple"> - <li>Built-in abstractions for <a class="reference internal" href="../concepts.html#streams-concepts-duality"><span class="std std-ref">streams and tables</span></a> in the form of - <a class="reference internal" href="../concepts.html#streams-concepts-kstream"><span class="std std-ref">KStream</span></a>, <a class="reference internal" href="../concepts.html#streams-concepts-ktable"><span class="std std-ref">KTable</span></a>, and - <a class="reference internal" href="../concepts.html#streams-concepts-globalktable"><span class="std std-ref">GlobalKTable</span></a>. Having first-class support for streams and tables is crucial + <li>Built-in abstractions for <a class="reference internal" href="../core-concepts.html#streams_concepts_duality"><span class="std std-ref">streams and tables</span></a> in the form of + <a class="reference internal" href="#streams_concepts_kstream"><span class="std std-ref">KStream</span></a>, <a class="reference internal" href="#streams_concepts_ktable"><span class="std std-ref">KTable</span></a>, and + <a class="reference internal" href="#streams_concepts_globalktable"><span class="std std-ref">GlobalKTable</span></a>. Having first-class support for streams and tables is crucial because, in practice, most use cases require not just either streams or databases/tables, but a combination of both. For example, if your use case is to create a customer 360-degree view that is updated in real-time, what your application will be doing is transforming many input <em>streams</em> of customer-related events into an output <em>table</em> @@ -93,7 +93,7 @@ <a class="reference internal" href="#streams-developer-guide-dsl-joins"><span class="std std-ref">joins</span></a> (e.g. <code class="docutils literal"><span class="pre">leftJoin</span></code>), and <a class="reference internal" href="#streams-developer-guide-dsl-windowing"><span class="std std-ref">windowing</span></a> (e.g. <a class="reference internal" href="#windowing-session"><span class="std std-ref">session windows</span></a>).</li> </ul> - <p>With the DSL, you can define <a class="reference internal" href="../concepts.html#streams-concepts-processor-topology"><span class="std std-ref">processor topologies</span></a> (i.e., the logical + <p>With the DSL, you can define <a class="reference internal" href="../core-concepts.html#streams_topology"><span class="std std-ref">processor topologies</span></a> (i.e., the logical processing plan) in your application. The steps to accomplish this are:</p> <ol class="arabic simple"> <li>Specify <a class="reference internal" href="#streams-developer-guide-dsl-sources"><span class="std std-ref">one or more input streams that are read from Kafka topics</span></a>.</li> @@ -104,6 +104,113 @@ action). A step-by-step guide for writing a stream processing application using the DSL is provided below.</p> <p>For a complete list of available API functionality, see also the <a href="../../../javadoc/org/apache/kafka/streams/package-summary.html">Streams</a> API docs.</p> </div> + + <div class="section" id="dsl-core-constructs-overview"> + <h4><a id="streams_concepts_kstream" href="#streams_concepts_kstream">KStream</a></h4> + + <p> + Only the <strong>Kafka Streams DSL</strong> has the notion of a <code>KStream</code>. + </p> + + <p> + A <strong>KStream</strong> is an abstraction of a <strong>record stream</strong>, where each data record represents a self-contained datum in the unbounded data set. Using the table analogy, data records in a record stream are always interpreted as an "INSERT" -- think: adding more entries to an append-only ledger -- because no record replaces an existing row with the same key. Examples are a credit card transaction, a page view event, or a server log entry. + </p> + + <p> + To illustrate, let's imagine the following two data records are being sent to the stream: + </p> + + <div class="sourcecode"> + <p>("alice", 1) --> ("alice", 3)</p> + </div> + + <p> + If your stream processing application were to sum the values per user, it would return <code>4</code> for <code>alice</code>. Why? Because the second data record would not be considered an update of the previous record. Compare this behavior of KStream to <code>KTable</code> below, + which would return <code>3</code> for <code>alice</code>. + </p> + + <h4><a id="streams_concepts_ktable" href="#streams_concepts_ktable">KTable</a></h4> + + <p> + Only the <strong>Kafka Streams DSL</strong> has the notion of a <code>KTable</code>. + </p> + + <p> + A <strong>KTable</strong> is an abstraction of a <strong>changelog stream</strong>, where each data record represents an update. More precisely, the value in a data record is interpreted as an "UPDATE" of the last value for the same record key, if any (if a corresponding key doesn't exist yet, the update will be considered an INSERT). Using the table analogy, a data record in a changelog stream is interpreted as an UPSERT aka INSERT/UPDATE because any existing r [...] + </p> + <p> + To illustrate, let's imagine the following two data records are being sent to the stream: + </p> + + <div class="sourcecode"> + <p> + ("alice", 1) --> ("alice", 3) + </p> + </div> + + <p> + If your stream processing application were to sum the values per user, it would return <code>3</code> for <code>alice</code>. Why? Because the second data record would be considered an update of the previous record. + </p> + + <p> + <strong>Effects of Kafka's log compaction:</strong> Another way of thinking about KStream and KTable is as follows: If you were to store a KTable into a Kafka topic, you'd probably want to enable Kafka's <a href="http://kafka.apache.org/documentation.html#compaction">log compaction</a> feature, e.g. to save storage space. + </p> + + <p> + However, it would not be safe to enable log compaction in the case of a KStream because, as soon as log compaction would begin purging older data records of the same key, it would break the semantics of the data. To pick up the illustration example again, you'd suddenly get a <code>3</code> for <code>alice</code> instead of a <code>4</code> because log compaction would have removed the <code>("alice", 1)</code> data record. Hence log compaction is perfectly safe [...] + </p> + + <p> + We have already seen an example of a changelog stream in the section <strong>streams_concepts_duality</strong>. Another example are change data capture (CDC) records in the changelog of a relational database, representing which row in a database table was inserted, updated, or deleted. + </p> + + <p> + KTable also provides an ability to look up <em>current</em> values of data records by keys. This table-lookup functionality is available through <strong>join operations</strong> (see also <strong>Joining</strong> in the Developer Guide) as well as through <strong>Interactive Queries</strong>. + </p> + + <h4><a id="streams_concepts_globalktable" href="#streams_concepts_globalktable">GlobalKTable</a></h4> + + <p>Only the <strong>Kafka Streams DSL</strong> has the notion of a <strong>GlobalKTable</strong>.</p> + + <p> + Like a <strong>KTable</strong>, a <strong>GlobalKTable</strong> is an abstraction of a <strong>changelog stream</strong>, where each data record represents an update. + </p> + + <p> + A GlobalKTable differs from a KTable in the data that they are being populated with, i.e. which data from the underlying Kafka topic is being read into the respective table. Slightly simplified, imagine you have an input topic with 5 partitions. In your application, you want to read this topic into a table. Also, you want to run your application across 5 application instances for <strong>maximum parallelism</strong>. + </p> + + <ul> + <li> + If you read the input topic into a <strong>KTable</strong>, then the "local" KTable instance of each application instance will be populated with data <strong>from only 1 partition</strong> of the topic's 5 partitions. + </li> + + <li> + If you read the input topic into a <strong>GlobalKTable</strong>, then the local GlobalKTable instance of each application instance will be populated with data <strong>from all partitions of the topic</strong>. + </li> + </ul> + + <p> + GlobalKTable provides the ability to look up <em>current</em> values of data records by keys. This table-lookup functionality is available through <code class="interpreted-text">join operations</code>. + </p> + <p>Benefits of global tables:</p> + + <ul> + <li> + More convenient and/or efficient <strong>joins</strong>: Notably, global tables allow you to perform star joins, they support "foreign-key" lookups (i.e., you can lookup data in the table not just by record key, but also by data in the record values), and they are more efficient when chaining multiple joins. Also, when joining against a global table, the input data does not need to be <strong>co-partitioned</strong>. + </li> + <li> + Can be used to "broadcast" information to all the running instances of your application. + </li> + </ul> + + <p>Downsides of global tables:</p> + <ul> + <li>Increased local storage consumption compared to the (partitioned) KTable because the entire topic is tracked.</li> + <li>Increased network and Kafka broker load compared to the (partitioned) KTable because the entire topic is read.</li> + </ul> + + </div> <div class="section" id="creating-source-streams-from-kafka"> <span id="streams-developer-guide-dsl-sources"></span><h2><a class="toc-backref" href="#id8">Creating source streams from Kafka</a><a class="headerlink" href="#creating-source-streams-from-kafka" title="Permalink to this headline"></a></h2> <p>You can easily read data from Kafka topics into your application. The following operations are supported.</p> @@ -123,8 +230,8 @@ <li><em>input topics</em> → KStream</li> </ul> </td> - <td><p class="first">Creates a <a class="reference internal" href="../concepts.html#streams-concepts-kstream"><span class="std std-ref">KStream</span></a> from the specified Kafka input topics and interprets the data - as a <a class="reference internal" href="../concepts.html#streams-concepts-kstream"><span class="std std-ref">record stream</span></a>. + <td><p class="first">Creates a <a class="reference internal" href="#streams_concepts_kstream"><span class="std std-ref">KStream</span></a> from the specified Kafka input topics and interprets the data + as a <a class="reference internal" href="#streams_concepts_kstream"><span class="std std-ref">record stream</span></a>. A <code class="docutils literal"><span class="pre">KStream</span></code> represents a <em>partitioned</em> record stream. <a class="reference external" href="../../../javadoc/org/apache/kafka/streams/StreamsBuilder.html#stream(java.lang.String)">(details)</a></p> <p>In the case of a KStream, the local KStream instance of every application instance will @@ -157,7 +264,7 @@ <li><em>input topic</em> → KTable</li> </ul> </td> - <td><p class="first">Reads the specified Kafka input topic into a <a class="reference internal" href="../concepts.html#streams-concepts-ktable"><span class="std std-ref">KTable</span></a>. The topic is + <td><p class="first">Reads the specified Kafka input topic into a <a class="reference internal" href="#streams_concepts_ktable"><span class="std std-ref">KTable</span></a>. The topic is interpreted as a changelog stream, where records with the same key are interpreted as UPSERT aka INSERT/UPDATE (when the record value is not <code class="docutils literal"><span class="pre">null</span></code>) or as DELETE (when the value is <code class="docutils literal"><span class="pre">null</span></code>) for that key. <a class="reference external" href="../../../javadoc/org/apache/kafka/streams/StreamsBuilder.html#table-java.lang.String(java.lang.String)">(details)</a></p> @@ -182,7 +289,7 @@ <li><em>input topic</em> → GlobalKTable</li> </ul> </td> - <td><p class="first">Reads the specified Kafka input topic into a <a class="reference internal" href="../concepts.html#streams-concepts-globalktable"><span class="std std-ref">GlobalKTable</span></a>. The topic is + <td><p class="first">Reads the specified Kafka input topic into a <a class="reference internal" href="#streams_concepts_globalktable"><span class="std std-ref">GlobalKTable</span></a>. The topic is interpreted as a changelog stream, where records with the same key are interpreted as UPSERT aka INSERT/UPDATE (when the record value is not <code class="docutils literal"><span class="pre">null</span></code>) or as DELETE (when the value is <code class="docutils literal"><span class="pre">null</span></code>) for that key. <a class="reference external" href="../../../javadoc/org/apache/kafka/streams/StreamsBuilder.html#globalTable-java.lang.String(java.lang.String)">(details)</a></p> @@ -225,7 +332,7 @@ <p>Some KStream transformations may generate one or more KStream objects, for example: - <code class="docutils literal"><span class="pre">filter</span></code> and <code class="docutils literal"><span class="pre">map</span></code> on a KStream will generate another KStream - <code class="docutils literal"><span class="pre">branch</span></code> on KStream can generate multiple KStreams</p> - <p>Some others may generate a KTable object, for example an aggregation of a KStream also yields a KTable. This allows Kafka Streams to continuously update the computed value upon arrivals of <a class="reference internal" href="../concepts.html#streams-concepts-aggregations"><span class="std std-ref">late records</span></a> after it + <p>Some others may generate a KTable object, for example an aggregation of a KStream also yields a KTable. This allows Kafka Streams to continuously update the computed value upon arrivals of <a class="reference internal" href="../core-concepts.html#streams_concepts_aggregations"><span class="std std-ref">late records</span></a> after it has already been produced to the downstream transformation operators.</p> <p>All KTable transformation operations can only generate another KTable. However, the Kafka Streams DSL does provide a special function that converts a KTable representation into a KStream. All of these transformation methods can be chained together to compose @@ -1461,7 +1568,7 @@ prices, inventory, customer information) in order to enrich a new data record (e.g. customer transaction) with context information. That is, scenarios where you need to perform table lookups at very large scale and with a low processing latency. Here, a popular pattern is to make the information in the databases available in Kafka through so-called - <em>change data capture</em> in combination with <a class="reference internal" href="../../connect/index.html#kafka-connect"><span class="std std-ref">Kafka’s Connect API</span></a>, and then implementing + <em>change data capture</em> in combination with <a class="reference internal" href="../../#connect"><span class="std std-ref">Kafka’s Connect API</span></a>, and then implementing applications that leverage the Streams API to perform very fast and efficient local joins of such tables and streams, rather than requiring the application to make a query to a remote database over the network for each record. In this example, the KTable concept in Kafka Streams would enable you to track the latest state @@ -1522,7 +1629,7 @@ <strong>It is the responsibility of the user to ensure data co-partitioning when joining</strong>.</p> <div class="admonition tip"> <p><b>Tip</b></p> - <p class="last">If possible, consider using <a class="reference internal" href="../concepts.html#streams-concepts-globalktable"><span class="std std-ref">global tables</span></a> (<code class="docutils literal"><span class="pre">GlobalKTable</span></code>) for joining because they do not require data co-partitioning.</p> + <p class="last">If possible, consider using <a class="reference internal" href="#streams_concepts_globalktable"><span class="std std-ref">global tables</span></a> (<code class="docutils literal"><span class="pre">GlobalKTable</span></code>) for joining because they do not require data co-partitioning.</p> </div> <p>The requirements for data co-partitioning are:</p> <ul class="simple"> @@ -1530,7 +1637,7 @@ <li>All applications that <em>write</em> to the input topics must have the <strong>same partitioning strategy</strong> so that records with the same key are delivered to same partition number. In other words, the keyspace of the input data must be distributed across partitions in the same manner. - This means that, for example, applications that use Kafka’s <a class="reference internal" href="../../clients/index.html#kafka-clients"><span class="std std-ref">Java Producer API</span></a> must use the + This means that, for example, applications that use Kafka’s <a class="reference internal" href="../../#producerapi"><span class="std std-ref">Java Producer API</span></a> must use the same partitioner (cf. the producer setting <code class="docutils literal"><span class="pre">"partitioner.class"</span></code> aka <code class="docutils literal"><span class="pre">ProducerConfig.PARTITIONER_CLASS_CONFIG</span></code>), and applications that use the Kafka’s Streams API must use the same <code class="docutils literal"><span class="pre">StreamPartitioner</span></code> for operations such as <code class="docutils literal"><span class="pre">KStream#to()</span></code>. The good news is that, if you happen to use the default partitioner-related settings across all @@ -1934,7 +2041,7 @@ <span id="streams-developer-guide-dsl-joins-ktable-ktable"></span><h5><a class="toc-backref" href="#id16">KTable-KTable Join</a><a class="headerlink" href="#ktable-ktable-join" title="Permalink to this headline"></a></h5> <p>KTable-KTable joins are always <em>non-windowed</em> joins. They are designed to be consistent with their counterparts in relational databases. The changelog streams of both KTables are materialized into local state stores to represent the - latest snapshot of their <a class="reference internal" href="../concepts.html#streams-concepts-ktable"><span class="std std-ref">table duals</span></a>. + latest snapshot of their <a class="reference internal" href="#streams_concepts_ktable"><span class="std std-ref">table duals</span></a>. The join result is a new KTable that represents the changelog stream of the join operation.</p> <p>Join output records are effectively created as follows, leveraging the user-supplied <code class="docutils literal"><span class="pre">ValueJoiner</span></code>:</p> <div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">KeyValue</span><span class="o"><</span><span class="n">K</span><span class="o">,</span> <span class="n">LV</span><span class="o">></span> <span class="n">leftRecord</span> <span class="o">=</span> <span class="o">...;</span> @@ -2499,13 +2606,13 @@ <div class="section" id="kstream-globalktable-join"> <span id="streams-developer-guide-dsl-joins-kstream-globalktable"></span><h5><a class="toc-backref" href="#id18">KStream-GlobalKTable Join</a><a class="headerlink" href="#kstream-globalktable-join" title="Permalink to this headline"></a></h5> <p>KStream-GlobalKTable joins are always <em>non-windowed</em> joins. They allow you to perform <em>table lookups</em> against a - <a class="reference internal" href="../concepts.html#streams-concepts-globalktable"><span class="std std-ref">GlobalKTable</span></a> (entire changelog stream) upon receiving a new record from the + <a class="reference internal" href="#streams_concepts_globalktable"><span class="std std-ref">GlobalKTable</span></a> (entire changelog stream) upon receiving a new record from the KStream (record stream). An example use case would be “star queries” or “star joins”, where you would enrich a stream of user activities (KStream) with the latest user profile information (GlobalKTable) and further context information (further GlobalKTables).</p> <p>At a high-level, KStream-GlobalKTable joins are very similar to <a class="reference internal" href="#streams-developer-guide-dsl-joins-kstream-ktable"><span class="std std-ref">KStream-KTable joins</span></a>. However, global tables provide you - with much more flexibility at the <a class="reference internal" href="../concepts.html#streams-concepts-globalktable"><span class="std std-ref">some expense</span></a> when compared to partitioned + with much more flexibility at the <a class="reference internal" href="#streams_concepts_globalktable"><span class="std std-ref">some expense</span></a> when compared to partitioned tables:</p> <ul class="simple"> <li>They do not require <a class="reference internal" href="#streams-developer-guide-dsl-joins-co-partitioning"><span class="std std-ref">data co-partitioning</span></a>.</li> @@ -2671,7 +2778,7 @@ defined window boundary. In aggregating operations, a windowing state store is used to store the latest aggregation results per window. Old records in the state store are purged after the specified - <a class="reference internal" href="../concepts.html#streams-concepts-windowing"><span class="std std-ref">window retention period</span></a>. + <a class="reference internal" href="../core-concepts.html#streams_concepts_windowing"><span class="std std-ref">window retention period</span></a>. Kafka Streams guarantees to keep a window for at least this specified time; the default value is one day and can be changed via <code class="docutils literal"><span class="pre">Windows#until()</span></code> and <code class="docutils literal"><span class="pre">SessionWindows#until()</span></code>.</p> <p>The DSL supports the following types of windows:</p> @@ -2851,7 +2958,7 @@ t=5 (blue), which lead to a merge of sessions and an extension of a session, res <li><strong>Combining ease-of-use with full flexibility where it’s needed:</strong> Even though you generally prefer to use the expressiveness of the DSL, there are certain steps in your processing that require more flexibility and tinkering than the DSL provides. For example, only the Processor API provides access to a - <a class="reference internal" href="../faq.html#streams-faq-processing-record-metadata"><span class="std std-ref">record’s metadata</span></a> such as its topic, partition, and offset information. + record’s metadata such as its topic, partition, and offset information. However, you don’t want to switch completely to the Processor API just because of that.</li> <li><strong>Migrating from other tools:</strong> You are migrating from other stream processing technologies that provide an imperative API, and migrating some of your legacy code to the Processor API was faster and/or easier than to @@ -2877,7 +2984,7 @@ t=5 (blue), which lead to a merge of sessions and an extension of a session, res <code class="docutils literal"><span class="pre">process()</span></code> allows you to leverage the <a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a> from the DSL. (<a class="reference external" href="../../../javadoc/org/apache/kafka/streams/kstream/KStream.html#process-org.apache.kafka.streams.processor.ProcessorSupplier-java.lang.String...-">details</a>)</p> <p>This is essentially equivalent to adding the <code class="docutils literal"><span class="pre">Processor</span></code> via <code class="docutils literal"><span class="pre">Topology#addProcessor()</span></code> to your - <a class="reference internal" href="../concepts.html#streams-concepts-processor-topology"><span class="std std-ref">processor topology</span></a>.</p> + <a class="reference internal" href="../core-concepts.html#streams_topology"><span class="std std-ref">processor topology</span></a>.</p> <p class="last">An example is available in the <a class="reference external" href="../../../javadoc/org/apache/kafka/streams/kstream/KStream.html#process-org.apache.kafka.streams.processor.ProcessorSupplier-java.lang.String...-">javadocs</a>.</p> </td> @@ -2897,7 +3004,7 @@ t=5 (blue), which lead to a merge of sessions and an extension of a session, res Applying a grouping or a join after <code class="docutils literal"><span class="pre">transform</span></code> will result in re-partitioning of the records. If possible use <code class="docutils literal"><span class="pre">transformValues</span></code> instead, which will not cause data re-partitioning.</p> <p><code class="docutils literal"><span class="pre">transform</span></code> is essentially equivalent to adding the <code class="docutils literal"><span class="pre">Transformer</span></code> via <code class="docutils literal"><span class="pre">Topology#addProcessor()</span></code> to your - <a class="reference internal" href="../concepts.html#streams-concepts-processor-topology"><span class="std std-ref">processor topology</span></a>.</p> + <a class="reference internal" href="../core-concepts.html#streams_topology"><span class="std std-ref">processor topology</span></a>.</p> <p class="last">An example is available in the <a class="reference external" href="../../../javadoc/org/apache/kafka/streams/kstream/KStream.html#transform-org.apache.kafka.streams.kstream.TransformerSupplier-java.lang.String...-">javadocs</a>. </p> @@ -2916,7 +3023,7 @@ t=5 (blue), which lead to a merge of sessions and an extension of a session, res The <code class="docutils literal"><span class="pre">ValueTransformer</span></code> may return <code class="docutils literal"><span class="pre">null</span></code> as the new value for a record.</p> <p><code class="docutils literal"><span class="pre">transformValues</span></code> is preferable to <code class="docutils literal"><span class="pre">transform</span></code> because it will not cause data re-partitioning.</p> <p><code class="docutils literal"><span class="pre">transformValues</span></code> is essentially equivalent to adding the <code class="docutils literal"><span class="pre">ValueTransformer</span></code> via <code class="docutils literal"><span class="pre">Topology#addProcessor()</span></code> to your - <a class="reference internal" href="../concepts.html#streams-concepts-processor-topology"><span class="std std-ref">processor topology</span></a>.</p> + <a class="reference internal" href="../core-concepts.html#streams_topology"><span class="std std-ref">processor topology</span></a>.</p> <p class="last">An example is available in the <a class="reference external" href="../../../javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-">javadocs</a>.</p> </td> diff --git a/docs/streams/developer-guide/processor-api.html b/docs/streams/developer-guide/processor-api.html index cb45cd9..22630b9 100644 --- a/docs/streams/developer-guide/processor-api.html +++ b/docs/streams/developer-guide/processor-api.html @@ -70,7 +70,7 @@ </div> <div class="section" id="defining-a-stream-processor"> <span id="streams-developer-guide-stream-processor"></span><h2><a class="toc-backref" href="#id2">Defining a Stream Processor</a><a class="headerlink" href="#defining-a-stream-processor" title="Permalink to this headline"></a></h2> - <p>A <a class="reference internal" href="../concepts.html#streams-concepts"><span class="std std-ref">stream processor</span></a> is a node in the processor topology that represents a single processing step. + <p>A <a class="reference internal" href="../core-concepts.html#streams_processor_node"><span class="std std-ref">stream processor</span></a> is a node in the processor topology that represents a single processing step. With the Processor API, you can define arbitrary stream processors that processes one received record at a time, and connect these processors with their associated state stores to compose the processor topology.</p> <p>You can define a customized stream processor by implementing the <code class="docutils literal"><span class="pre">Processor</span></code> interface, which provides the <code class="docutils literal"><span class="pre">process()</span></code> API method. @@ -91,7 +91,7 @@ Note, that <code class="docutils literal"><span class="pre">#forward()</span></code> also allows to change the default behavior by passing a custom timestamp for the output record.</p> <p>Specifically, <code class="docutils literal"><span class="pre">ProcessorContext#schedule()</span></code> accepts a user <code class="docutils literal"><span class="pre">Punctuator</span></code> callback interface, which triggers its <code class="docutils literal"><span class="pre">punctuate()</span></code> API method periodically based on the <code class="docutils literal"><span class="pre">PunctuationType</span></code>. The <code class="docutils literal"><span class="pre">PunctuationType</span></code> determines what notion of time is used - for the punctuation scheduling: either <a class="reference internal" href="../concepts.html#streams-concepts-time"><span class="std std-ref">stream-time</span></a> or wall-clock-time (by default, stream-time + for the punctuation scheduling: either <a class="reference internal" href="../core-concepts.html#streams_time"><span class="std std-ref">stream-time</span></a> or wall-clock-time (by default, stream-time is configured to represent event-time via <code class="docutils literal"><span class="pre">TimestampExtractor</span></code>). When stream-time is used, <code class="docutils literal"><span class="pre">punctuate()</span></code> is triggered purely by data because stream-time is determined (and advanced forward) by the timestamps derived from the input data. When there is no new input data arriving, stream-time is not advanced and thus <code class="docutils literal"><span class="pre">punctuate()</span></code> is not called.</p> diff --git a/docs/streams/upgrade-guide.html b/docs/streams/upgrade-guide.html index 34f66ce..5ec4103 100644 --- a/docs/streams/upgrade-guide.html +++ b/docs/streams/upgrade-guide.html @@ -130,7 +130,7 @@ put operation metrics would now be <code>kafka.streams:type=stream-rocksdb-window-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),rocksdb-window-state-id=([-.\w]+)</code>. Users need to update their metrics collecting and reporting systems for their time and session windowed stores accordingly. - For more details, please read the <a href="/{{version}}/documentation/ops.html#kafka_streams_store_monitoring">State Store Metrics</a> section. + For more details, please read the <a href="/{{version}}/documentation/#kafka_streams_store_monitoring">State Store Metrics</a> section. </p> <p>