This is an automated email from the ASF dual-hosted git repository. yuanmei pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/flink-web.git
commit 511fa8c116e40b783f8a24d197c101fc9add9644 Author: Yuan Mei <[email protected]> AuthorDate: Sun May 8 23:34:27 2022 +0800 Rebuild website --- content/2022/05/06/async-sink-base.html | 441 ++++++++++++++++++++++++++ content/blog/feed.xml | 537 ++++++++++---------------------- content/blog/index.html | 36 ++- content/blog/page10/index.html | 36 ++- content/blog/page11/index.html | 38 ++- content/blog/page12/index.html | 40 ++- content/blog/page13/index.html | 38 ++- content/blog/page14/index.html | 36 ++- content/blog/page15/index.html | 37 ++- content/blog/page16/index.html | 38 ++- content/blog/page17/index.html | 44 +-- content/blog/page18/index.html | 45 ++- content/blog/page19/index.html | 25 ++ content/blog/page2/index.html | 38 ++- content/blog/page3/index.html | 40 ++- content/blog/page4/index.html | 40 ++- content/blog/page5/index.html | 38 ++- content/blog/page6/index.html | 36 ++- content/blog/page7/index.html | 38 ++- content/blog/page8/index.html | 40 ++- content/blog/page9/index.html | 38 ++- content/community.html | 6 + content/index.html | 6 +- content/zh/community.html | 6 + content/zh/index.html | 6 +- 25 files changed, 1094 insertions(+), 629 deletions(-) diff --git a/content/2022/05/06/async-sink-base.html b/content/2022/05/06/async-sink-base.html new file mode 100644 index 000000000..ca9a4742b --- /dev/null +++ b/content/2022/05/06/async-sink-base.html @@ -0,0 +1,441 @@ +<!DOCTYPE html> +<html lang="en"> + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> + <title>Apache Flink: The Generic Asynchronous Base Sink</title> + <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"> + <link rel="icon" href="/favicon.ico" type="image/x-icon"> + + <!-- Bootstrap --> + <link rel="stylesheet" href="/css/bootstrap.min.css"> + <link rel="stylesheet" href="/css/flink.css"> + <link rel="stylesheet" href="/css/syntax.css"> + + <!-- Blog RSS feed --> + <link href="/blog/feed.xml" rel="alternate" type="application/rss+xml" title="Apache Flink Blog: RSS feed" /> + + <!-- jQuery (necessary for Bootstrap's JavaScript plugins) --> + <!-- We need to load Jquery in the header for custom google analytics event tracking--> + <script src="/js/jquery.min.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!-- WARNING: Respond.js doesn't work if you view the page via file:// --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + <!-- Matomo --> + <script> + var _paq = window._paq = window._paq || []; + /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ + /* We explicitly disable cookie tracking to avoid privacy issues */ + _paq.push(['disableCookies']); + /* Measure a visit to flink.apache.org and nightlies.apache.org/flink as the same visit */ + _paq.push(["setDomains", ["*.flink.apache.org","*.nightlies.apache.org/flink"]]); + _paq.push(['trackPageView']); + _paq.push(['enableLinkTracking']); + (function() { + var u="//matomo.privacy.apache.org/"; + _paq.push(['setTrackerUrl', u+'matomo.php']); + _paq.push(['setSiteId', '1']); + var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; + g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); + })(); + </script> + <!-- End Matomo Code --> + </head> + <body> + + + <!-- Main content. --> + <div class="container"> + <div class="row"> + + + <div id="sidebar" class="col-sm-3"> + + +<!-- Top navbar. --> + <nav class="navbar navbar-default"> + <!-- The logo. --> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <div class="navbar-logo"> + <a href="/"> + <img alt="Apache Flink" src="/img/flink-header-logo.svg" width="147px" height="73px"> + </a> + </div> + </div><!-- /.navbar-header --> + + <!-- The navigation links. --> + <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1"> + <ul class="nav navbar-nav navbar-main"> + + <!-- First menu section explains visitors what Flink is --> + + <!-- What is Stream Processing? --> + <!-- + <li><a href="/streamprocessing1.html">What is Stream Processing?</a></li> + --> + + <!-- What is Flink? --> + <li><a href="/flink-architecture.html">What is Apache Flink?</a></li> + + + + <!-- Stateful Functions? --> + + <li><a href="https://nightlies.apache.org/flink/flink-statefun-docs-stable/">What is Stateful Functions?</a></li> + + <!-- Flink ML? --> + + <li><a href="https://nightlies.apache.org/flink/flink-ml-docs-stable/">What is Flink ML?</a></li> + + <!-- Use cases --> + <li><a href="/usecases.html">Use Cases</a></li> + + <!-- Powered by --> + <li><a href="/poweredby.html">Powered By</a></li> + + + + <!-- Second menu section aims to support Flink users --> + + <!-- Downloads --> + <li><a href="/downloads.html">Downloads</a></li> + + <!-- Getting Started --> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Getting Started<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="https://nightlies.apache.org/flink/flink-docs-release-1.15//docs/try-flink/local_installation/" target="_blank">With Flink <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/getting-started/project-setup.html" target="_blank">With Flink Stateful Functions <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-ml-docs-release-2.0/try-flink-ml/quick-start.html" target="_blank">With Flink ML <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-0.1/try-flink-kubernetes-operator/quick-start.html" target="_blank">With Flink Kubernetes Operator <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-table-store-docs-release-0.1/try-table-store/quick-start.html" target="_blank">With Flink Table Store <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="/training.html">Training Course</a></li> + </ul> + </li> + + <!-- Documentation --> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Documentation<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="https://nightlies.apache.org/flink/flink-docs-release-1.15" target="_blank">Flink 1.15 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-docs-master" target="_blank">Flink Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2" target="_blank">Flink Stateful Functions 3.2 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-statefun-docs-master" target="_blank">Flink Stateful Functions Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-ml-docs-release-2.0" target="_blank">Flink ML 2.0 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-ml-docs-master" target="_blank">Flink ML Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-0.1" target="_blank">Flink Kubernetes Operator 0.1 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main" target="_blank">Flink Kubernetes Operator Main (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-table-store-docs-release-0.1" target="_blank">Flink Table Store 0.1 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-table-store-docs-master" target="_blank">Flink Table Store Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + </ul> + </li> + + <!-- getting help --> + <li><a href="/gettinghelp.html">Getting Help</a></li> + + <!-- Blog --> + <li><a href="/blog/"><b>Flink Blog</b></a></li> + + + <!-- Flink-packages --> + <li> + <a href="https://flink-packages.org" target="_blank">flink-packages.org <small><span class="glyphicon glyphicon-new-window"></span></small></a> + </li> + + + <!-- Third menu section aim to support community and contributors --> + + <!-- Community --> + <li><a href="/community.html">Community & Project Info</a></li> + + <!-- Roadmap --> + <li><a href="/roadmap.html">Roadmap</a></li> + + <!-- Contribute --> + <li><a href="/contributing/how-to-contribute.html">How to Contribute</a></li> + + + <!-- GitHub --> + <li> + <a href="https://github.com/apache/flink" target="_blank">Flink on GitHub <small><span class="glyphicon glyphicon-new-window"></span></small></a> + </li> + + + + <!-- Language Switcher --> + <li> + + + <a href="/zh/2022/05/06/async-sink-base.html">中文版</a> + + + </li> + + </ul> + + <style> + .smalllinks:link { + display: inline-block !important; background: none; padding-top: 0px; padding-bottom: 0px; padding-right: 0px; min-width: 75px; + } + </style> + + <ul class="nav navbar-nav navbar-bottom"> + <hr /> + + <!-- Twitter --> + <li><a href="https://twitter.com/apacheflink" target="_blank">@ApacheFlink <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + + <!-- Visualizer --> + <li class=" hidden-md hidden-sm"><a href="/visualizer/" target="_blank">Plan Visualizer <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + + <li > + <a href="/security.html">Flink Security</a> + </li> + + <hr /> + + <li><a href="https://apache.org" target="_blank">Apache Software Foundation <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + + <li> + + <a class="smalllinks" href="https://www.apache.org/licenses/" target="_blank">License</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + + <a class="smalllinks" href="https://www.apache.org/security/" target="_blank">Security</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + + <a class="smalllinks" href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Donate</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + + <a class="smalllinks" href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + </li> + + </ul> + </div><!-- /.navbar-collapse --> + </nav> + + </div> + <div class="col-sm-9"> + <div class="row-fluid"> + <div class="col-sm-12"> + <div class="row"> + <h1>The Generic Asynchronous Base Sink</h1> + <p><i></i></p> + + <article> + <p>06 May 2022 Zichen Liu </p> + +<p>Flink sinks share a lot of similar behavior. Most sinks batch records according to user-defined buffering hints, sign requests, write them to the destination, retry unsuccessful or throttled requests, and participate in checkpointing.</p> + +<p>This is why for Flink 1.15 we have decided to create the <a href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink"><code>AsyncSinkBase</code> (FLIP-171)</a>, an abstract sink with a number of common functionalities extracted.</p> + +<p>This is a base implementation for asynchronous sinks, which you should use whenever you need to implement a sink that doesn’t offer transactional capabilities. Adding support for a new destination now only requires a lightweight shim that implements the specific interfaces of the destination using a client that supports async requests.</p> + +<p>This common abstraction will reduce the effort required to maintain individual sinks that extend from this abstract sink, with bug fixes and improvements to the sink core benefiting all implementations that extend it. The design of <code>AsyncSinkBase</code> focuses on extensibility and a broad support of destinations. The core of the sink is kept generic and free of any connector-specific dependencies.</p> + +<p>The sink base is designed to participate in checkpointing to provide at-least-once semantics and can work directly with destinations that provide a client that supports asynchronous requests.</p> + +<p>In this post, we will go over the details of the AsyncSinkBase so that you can start using it to build your own concrete sink.</p> + +<div class="page-toc"> +<ul id="markdown-toc"> + <li><a href="#adding-the-base-sink-as-a-dependency" id="markdown-toc-adding-the-base-sink-as-a-dependency">Adding the base sink as a dependency</a></li> + <li><a href="#the-public-interfaces-of-asyncsinkbase" id="markdown-toc-the-public-interfaces-of-asyncsinkbase">The Public Interfaces of AsyncSinkBase</a> <ul> + <li><a href="#generic-types" id="markdown-toc-generic-types">Generic Types</a></li> + <li><a href="#element-converter-interface" id="markdown-toc-element-converter-interface">Element Converter Interface</a></li> + <li><a href="#sink-writer-interface" id="markdown-toc-sink-writer-interface">Sink Writer Interface</a></li> + <li><a href="#sink-interface" id="markdown-toc-sink-interface">Sink Interface</a></li> + </ul> + </li> + <li><a href="#metrics" id="markdown-toc-metrics">Metrics</a></li> + <li><a href="#sink-behavior" id="markdown-toc-sink-behavior">Sink Behavior</a></li> + <li><a href="#summary" id="markdown-toc-summary">Summary</a></li> +</ul> + +</div> + +<h1 id="adding-the-base-sink-as-a-dependency">Adding the base sink as a dependency</h1> + +<p>In order to use the base sink, you will need to add the following dependency to your project. The example below follows the Maven syntax:</p> + +<div class="highlight"><pre><code class="language-xml"><span class="nt"><dependency></span> + <span class="nt"><groupId></span>org.apache.flink<span class="nt"></groupId></span> + <span class="nt"><artifactId></span>flink-connector-base<span class="nt"></artifactId></span> + <span class="nt"><version></span>${flink.version}<span class="nt"></version></span> +<span class="nt"></dependency></span></code></pre></div> + +<h1 id="the-public-interfaces-of-asyncsinkbase">The Public Interfaces of AsyncSinkBase</h1> + +<h2 id="generic-types">Generic Types</h2> + +<p><code><InputT></code> – type of elements in a DataStream that should be passed to the sink</p> + +<p><code><RequestEntryT></code> – type of a payload containing the element and additional metadata that is required to submit a single element to the destination</p> + +<h2 id="element-converter-interface">Element Converter Interface</h2> + +<p><a href="https://github.com/apache/flink/blob/release-1.15.0/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/ElementConverter.java">ElementConverter</a></p> + +<div class="highlight"><pre><code class="language-java"><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">ElementConverter</span><span class="o"><</span><span class="n">InputT</span><span class="o">,</span> <span class="n">RequestEntryT</span><span class="o">></span> <span class="kd">extends</span> <span class="n">Serializable</span> <span class="o">{</span> + <span class="n">RequestEntryT</span> <span class="nf">apply</span><span class="o">(</span><span class="n">InputT</span> <span class="n">element</span><span class="o">,</span> <span class="n">SinkWriter</span><span class="o">.</span><span class="na">Context</span> <span class="n">context</span><span class="o">);</span> +<span class="o">}</span></code></pre></div> +<p>The concrete sink implementation should provide a way to convert from an element in the DataStream to the payload type that contains all the additional metadata required to submit that element to the destination by the sink. Ideally, this would be encapsulated from the end user since it allows concrete sink implementers to adapt to changes in the destination API without breaking end user code.</p> + +<h2 id="sink-writer-interface">Sink Writer Interface</h2> + +<p><a href="https://github.com/apache/flink/blob/release-1.15.0/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java">AsyncSinkWriter</a></p> + +<p>There is a buffer in the sink writer that holds the request entries that have been sent to the sink but not yet written to the destination. An element of the buffer is a <code>RequestEntryWrapper<RequestEntryT></code> consisting of the <code>RequestEntryT</code> along with the size of that record.</p> + +<div class="highlight"><pre><code class="language-java"><span class="kd">public</span> <span class="kd">abstract</span> <span class="kd">class</span> <span class="nc">AsyncSinkWriter</span><span class="o"><</span><span class="n">InputT</span><span class="o">,</span> <span class="n">RequestEntryT</span> <span class="kd">extends</span> <span class="n">Serializable</span><span class="o">></span> + <span class="kd">implements</span> <span class="n">StatefulSink</span><span class="o">.</span><span class="na">StatefulSinkWriter</span><span class="o"><</span><span class="n">InputT</span><span class="o">,</span> <span class="n">BufferedRequestState</span><span class="o"><</span><span class="n">RequestEntryT</span><span class="o">>></span> <span class="o">{</span> + <span class="c1">// ...</span> + <span class="kd">protected</span> <span class="kd">abstract</span> <span class="kt">void</span> <span class="nf">submitRequestEntries</span><span class="o">(</span> + <span class="n">List</span><span class="o"><</span><span class="n">RequestEntryT</span><span class="o">></span> <span class="n">requestEntries</span><span class="o">,</span> <span class="n">Consumer</span><span class="o"><</span><span class="n">List</span><span class="o"><</span><span class="n">RequestEntryT</span><span class="o">>></span> <span class="n">requestResult</span><span class="o">);</span> + <span class="c1">// ...</span> +<span class="o">}</span></code></pre></div> + +<p>We will submit the <code>requestEntries</code> asynchronously to the destination from here. Sink implementers should use the client libraries of the destination they intend to write to, to perform this.</p> + +<p>Should any elements fail to be persisted, they will be requeued back in the buffer for retry using <code>requestResult.accept(...list of failed entries...)</code>. However, retrying any element that is known to be faulty and consistently failing, will result in that element being requeued forever, therefore a sensible strategy for determining what should be retried is highly recommended. If no errors were returned, we must indicate this with <code>requestResult.accept(Collections.empt [...] + +<p>If at any point, it is determined that a fatal error has occurred and that we should throw a runtime exception from the sink, we can call <code>getFatalExceptionCons().accept(...);</code> from anywhere in the concrete sink writer.</p> + +<div class="highlight"><pre><code class="language-java"><span class="kd">public</span> <span class="kd">abstract</span> <span class="kd">class</span> <span class="nc">AsyncSinkWriter</span><span class="o"><</span><span class="n">InputT</span><span class="o">,</span> <span class="n">RequestEntryT</span> <span class="kd">extends</span> <span class="n">Serializable</span><span class="o">></span> + <span class="kd">implements</span> <span class="n">StatefulSink</span><span class="o">.</span><span class="na">StatefulSinkWriter</span><span class="o"><</span><span class="n">InputT</span><span class="o">,</span> <span class="n">BufferedRequestState</span><span class="o"><</span><span class="n">RequestEntryT</span><span class="o">>></span> <span class="o">{</span> + <span class="c1">// ...</span> + <span class="kd">protected</span> <span class="kd">abstract</span> <span class="kt">long</span> <span class="nf">getSizeInBytes</span><span class="o">(</span><span class="n">RequestEntryT</span> <span class="n">requestEntry</span><span class="o">);</span> + <span class="c1">// ...</span> +<span class="o">}</span></code></pre></div> +<p>The async sink has a concept of size of elements in the buffer. This allows users to specify a byte size threshold beyond which elements will be flushed. However the sink implementer is best positioned to determine what the most sensible measure of size for each <code>RequestEntryT</code> is. If there is no way to determine the size of a record, then the value <code>0</code> may be returned, and the sink will not flush based on record size triggers.</p> + +<div class="highlight"><pre><code class="language-java"><span class="kd">public</span> <span class="kd">abstract</span> <span class="kd">class</span> <span class="nc">AsyncSinkWriter</span><span class="o"><</span><span class="n">InputT</span><span class="o">,</span> <span class="n">RequestEntryT</span> <span class="kd">extends</span> <span class="n">Serializable</span><span class="o">></span> + <span class="kd">implements</span> <span class="n">StatefulSink</span><span class="o">.</span><span class="na">StatefulSinkWriter</span><span class="o"><</span><span class="n">InputT</span><span class="o">,</span> <span class="n">BufferedRequestState</span><span class="o"><</span><span class="n">RequestEntryT</span><span class="o">>></span> <span class="o">{</span> + <span class="c1">// ...</span> + <span class="kd">public</span> <span class="nf">AsyncSinkWriter</span><span class="o">(</span> + <span class="n">ElementConverter</span><span class="o"><</span><span class="n">InputT</span><span class="o">,</span> <span class="n">RequestEntryT</span><span class="o">></span> <span class="n">elementConverter</span><span class="o">,</span> + <span class="n">Sink</span><span class="o">.</span><span class="na">InitContext</span> <span class="n">context</span><span class="o">,</span> + <span class="kt">int</span> <span class="n">maxBatchSize</span><span class="o">,</span> + <span class="kt">int</span> <span class="n">maxInFlightRequests</span><span class="o">,</span> + <span class="kt">int</span> <span class="n">maxBufferedRequests</span><span class="o">,</span> + <span class="kt">long</span> <span class="n">maxBatchSizeInBytes</span><span class="o">,</span> + <span class="kt">long</span> <span class="n">maxTimeInBufferMS</span><span class="o">,</span> + <span class="kt">long</span> <span class="n">maxRecordSizeInBytes</span><span class="o">)</span> <span class="o">{</span> <span class="cm">/* ... */</span> <span class="o">}</span> + <span class="c1">// ...</span> +<span class="o">}</span></code></pre></div> + +<p>By default, the method <code>snapshotState</code> returns all the elements in the buffer to be saved for snapshots. Any elements that were previously removed from the buffer are guaranteed to be persisted in the destination by a preceding call to <code>AsyncWriter#flush(true)</code>. +You may want to save additional state from the concrete sink. You can achieve this by overriding <code>snapshotState</code>, and restoring from the saved state in the constructor. You will receive the saved state by overriding <code>restoreWriter</code> in your concrete sink. In this method, you should construct a sink writer, passing in the recovered state.</p> + +<div class="highlight"><pre><code class="language-java"><span class="kd">class</span> <span class="nc">MySinkWriter</span><span class="o"><</span><span class="n">InputT</span><span class="o">></span> <span class="kd">extends</span> <span class="n">AsyncSinkWriter</span><span class="o"><</span><span class="n">InputT</span><span class="o">,</span> <span class="n">RequestEntryT</span><span class="o">></span> <span class="o">{</span> + + <span class="n">MySinkWriter</span><span class="o">(</span> + <span class="c1">// ... </span> + <span class="n">Collection</span><span class="o"><</span><span class="n">BufferedRequestState</span><span class="o"><</span><span class="n">Record</span><span class="o">>></span> <span class="n">initialStates</span><span class="o">)</span> <span class="o">{</span> + <span class="kd">super</span><span class="o">(</span> + <span class="c1">// ...</span> + <span class="n">initialStates</span><span class="o">);</span> + <span class="c1">// restore concrete sink state from initialStates</span> + <span class="o">}</span> + + <span class="nd">@Override</span> + <span class="kd">public</span> <span class="n">List</span><span class="o"><</span><span class="n">BufferedRequestState</span><span class="o"><</span><span class="n">RequestEntryT</span><span class="o">>></span> <span class="nf">snapshotState</span><span class="o">(</span><span class="kt">long</span> <span class="n">checkpointId</span><span class="o">)</span> <span class="o">{</span> + <span class="kd">super</span><span class="o">.</span><span class="na">snapshotState</span><span class="o">(</span><span class="n">checkpointId</span><span class="o">);</span> + <span class="c1">// ...</span> + <span class="o">}</span> + +<span class="o">}</span></code></pre></div> + +<h2 id="sink-interface">Sink Interface</h2> + +<p><a href="https://github.com/apache/flink/blob/release-1.15.0/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/AsyncSinkBase.java">AsyncSinkBase</a></p> + +<div class="highlight"><pre><code class="language-java"><span class="kd">class</span> <span class="nc">MySink</span><span class="o"><</span><span class="n">InputT</span><span class="o">></span> <span class="kd">extends</span> <span class="n">AsyncSinkBase</span><span class="o"><</span><span class="n">InputT</span><span class="o">,</span> <span class="n">RequestEntryT</span><span class="o">></span> <span class="o">{</span> + <span class="c1">// ...</span> + <span class="nd">@Override</span> + <span class="kd">public</span> <span class="n">StatefulSinkWriter</span><span class="o"><</span><span class="n">InputT</span><span class="o">,</span> <span class="n">BufferedRequestState</span><span class="o"><</span><span class="n">RequestEntryT</span><span class="o">>></span> <span class="nf">createWriter</span><span class="o">(</span><span class="n">InitContext</span> <span class="n">context</span><span class="o">)</span> <span class="o">{</span> + <span class="k">return</span> <span class="k">new</span> <span class="nf">MySinkWriter</span><span class="o">(</span><span class="n">context</span><span class="o">);</span> + <span class="o">}</span> + <span class="c1">// ...</span> +<span class="o">}</span></code></pre></div> +<p>AsyncSinkBase implementations return their own extension of the <code>AsyncSinkWriter</code> from <code>createWriter()</code>.</p> + +<p>At the time of writing, the <a href="https://github.com/apache/flink/tree/release-1.15.0/flink-connectors/flink-connector-aws-kinesis-streams">Kinesis Data Streams sink</a> and <a href="https://github.com/apache/flink/tree/release-1.15.0/flink-connectors/flink-connector-aws-kinesis-firehose">Kinesis Data Firehose sink</a> are using this base sink.</p> + +<h1 id="metrics">Metrics</h1> + +<p>There are three metrics that automatically exist when you implement sinks (and, thus, should not be implemented by yourself).</p> + +<ul> + <li>CurrentSendTime Gauge - returns the amount of time in milliseconds it took for the most recent request to write records to complete, whether successful or not.</li> + <li>NumBytesOut Counter - counts the total number of bytes the sink has tried to write to the destination, using the method <code>getSizeInBytes</code> to determine the size of each record. This will double count failures that may need to be retried.</li> + <li>NumRecordsOut Counter - similar to above, this counts the total number of records the sink has tried to write to the destination. This will double count failures that may need to be retried.</li> +</ul> + +<h1 id="sink-behavior">Sink Behavior</h1> + +<p>There are six sink configuration settings that control the buffering, flushing, and retry behavior of the sink.</p> + +<ul> + <li><code>int maxBatchSize</code> - maximum number of elements that may be passed in the list to submitRequestEntries to be written downstream.</li> + <li><code>int maxInFlightRequests</code> - maximum number of uncompleted calls to submitRequestEntries that the SinkWriter will allow at any given point. Once this point has reached, writes and callbacks to add elements to the buffer may block until one or more requests to submitRequestEntries completes.</li> + <li><code>int maxBufferedRequests</code> - maximum buffer length. Callbacks to add elements to the buffer and calls to write will block if this length has been reached and will only unblock if elements from the buffer have been removed for flushing.</li> + <li><code>long maxBatchSizeInBytes</code> - a flush will be attempted if the most recent call to write introduces an element to the buffer such that the total size of the buffer is greater than or equal to this threshold value.</li> + <li><code>long maxTimeInBufferMS</code> - maximum amount of time an element may remain in the buffer. In most cases elements are flushed as a result of the batch size (in bytes or number) being reached or during a snapshot. However, there are scenarios where an element may remain in the buffer forever or a long period of time. To mitigate this, a timer is constantly active in the buffer such that: while the buffer is not empty, it will flush every maxTimeInBufferMS milliseconds.</li> + <li><code>long maxRecordSizeInBytes</code> - maximum size in bytes allowed for a single record, as determined by <code>getSizeInBytes()</code>.</li> +</ul> + +<p>Destinations typically have a defined throughput limit and will begin throttling or rejecting requests once near. We employ <a href="https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease">Additive Increase Multiplicative Decrease (AIMD)</a> as a strategy for selecting the optimal batch size.</p> + +<h1 id="summary">Summary</h1> + +<p>The AsyncSinkBase is a new abstraction that makes creating and maintaining async sinks easier. This will be available in Flink 1.15 and we hope that you will try it out and give us feedback on it.</p> + + </article> + </div> + + <div class="row"> + <div id="disqus_thread"></div> + <script type="text/javascript"> + /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ + var disqus_shortname = 'stratosphere-eu'; // required: replace example with your forum shortname + + /* * * DON'T EDIT BELOW THIS LINE * * */ + (function() { + var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; + dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; + (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); + })(); + </script> + </div> + </div> +</div> + </div> + </div> + + <hr /> + + <div class="row"> + <div class="footer text-center col-sm-12"> + <p>Copyright © 2014-2022 <a href="http://apache.org">The Apache Software Foundation</a>. All Rights Reserved.</p> + <p>Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.</p> + <p><a href="https://privacy.apache.org/policies/privacy-policy-public.html">Privacy Policy</a> · <a href="/blog/feed.xml">RSS feed</a></p> + </div> + </div> + </div><!-- /.container --> + + <!-- Include all compiled plugins (below), or include individual files as needed --> + <script src="/js/jquery.matchHeight-min.js"></script> + <script src="/js/bootstrap.min.js"></script> + <script src="/js/codetabs.js"></script> + <script src="/js/stickysidebar.js"></script> + </body> +</html> diff --git a/content/blog/feed.xml b/content/blog/feed.xml index 2d6c93b85..b20a012f1 100644 --- a/content/blog/feed.xml +++ b/content/blog/feed.xml @@ -6,6 +6,179 @@ <link>https://flink.apache.org/blog</link> <atom:link href="https://flink.apache.org/blog/feed.xml" rel="self" type="application/rss+xml" /> +<item> +<title>The Generic Asynchronous Base Sink</title> +<description><p>Flink sinks share a lot of similar behavior. Most sinks batch records according to user-defined buffering hints, sign requests, write them to the destination, retry unsuccessful or throttled requests, and participate in checkpointing.</p> + +<p>This is why for Flink 1.15 we have decided to create the <a href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink"><code>AsyncSinkBase</code> (FLIP-171)</a>, an abstract sink with a number of common functionalities extracted.</p> + +<p>This is a base implementation for asynchronous sinks, which you should use whenever you need to implement a sink that doesn’t offer transactional capabilities. Adding support for a new destination now only requires a lightweight shim that implements the specific interfaces of the destination using a client that supports async requests.</p> + +<p>This common abstraction will reduce the effort required to maintain individual sinks that extend from this abstract sink, with bug fixes and improvements to the sink core benefiting all implementations that extend it. The design of <code>AsyncSinkBase</code> focuses on extensibility and a broad support of destinations. The core of the sink is kept generic and free of any connector-specific dependencies.</p> + +<p>The sink base is designed to participate in checkpointing to provide at-least-once semantics and can work directly with destinations that provide a client that supports asynchronous requests.</p> + +<p>In this post, we will go over the details of the AsyncSinkBase so that you can start using it to build your own concrete sink.</p> + +<div class="page-toc"> +<ul id="markdown-toc"> + <li><a href="#adding-the-base-sink-as-a-dependency" id="markdown-toc-adding-the-base-sink-as-a-dependency">Adding the base sink as a dependency</a></li> + <li><a href="#the-public-interfaces-of-asyncsinkbase" id="markdown-toc-the-public-interfaces-of-asyncsinkbase">The Public Interfaces of AsyncSinkBase</a> <ul> + <li><a href="#generic-types" id="markdown-toc-generic-types">Generic Types</a></li> + <li><a href="#element-converter-interface" id="markdown-toc-element-converter-interface">Element Converter Interface</a></li> + <li><a href="#sink-writer-interface" id="markdown-toc-sink-writer-interface">Sink Writer Interface</a></li> + <li><a href="#sink-interface" id="markdown-toc-sink-interface">Sink Interface</a></li> + </ul> + </li> + <li><a href="#metrics" id="markdown-toc-metrics">Metrics</a></li> + <li><a href="#sink-behavior" id="markdown-toc-sink-behavior">Sink Behavior</a></li> + <li><a href="#summary" id="markdown-toc-summary">Summary</a></li> +</ul> + +</div> + +<h1 id="adding-the-base-sink-as-a-dependency">Adding the base sink as a dependency</h1> + +<p>In order to use the base sink, you will need to add the following dependency to your project. The example below follows the Maven syntax:</p> + +<div class="highlight"><pre><code class="language-xml"><span class="nt">&lt;dependency&gt;</span> + <span class="nt">&lt;groupId&gt;</span>org.apache.flink<span class="nt">&lt;/groupId&gt;</span> + <span class="nt">&lt;artifactId&gt;</span>flink-connector-base<span class="nt">&lt;/artifactId&gt;</span> + <span class="nt">&lt;version&gt;</span>${flink.version}<span class="nt">&lt;/version&gt;</span> +<span class="nt">&lt;/dependency&gt;</span></code></pre></div> + +<h1 id="the-public-interfaces-of-asyncsinkbase">The Public Interfaces of AsyncSinkBase</h1> + +<h2 id="generic-types">Generic Types</h2> + +<p><code>&lt;InputT&gt;</code> – type of elements in a DataStream that should be passed to the sink</p> + +<p><code>&lt;RequestEntryT&gt;</code> – type of a payload containing the element and additional metadata that is required to submit a single element to the destination</p> + +<h2 id="element-converter-interface">Element Converter Interface</h2> + +<p><a href="https://github.com/apache/flink/blob/release-1.15.0/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/ElementConverter.java">ElementConverter</a></p> + +<div class="highlight"><pre><code class="language-java"><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">ElementConverter</span><span class="o">&lt;</span><span class="n">InputT</span><span class="o">,</span> <span class="n">RequestEntryT</span><span class=&qu [...] + <span class="n">RequestEntryT</span> <span class="nf">apply</span><span class="o">(</span><span class="n">InputT</span> <span class="n">element</span><span class="o">,</span> <span class="n">SinkWriter</span><span class="o">.</span><span class="na">Context</span> <span class="n&quo [...] +<span class="o">}</span></code></pre></div> +<p>The concrete sink implementation should provide a way to convert from an element in the DataStream to the payload type that contains all the additional metadata required to submit that element to the destination by the sink. Ideally, this would be encapsulated from the end user since it allows concrete sink implementers to adapt to changes in the destination API without breaking end user code.</p> + +<h2 id="sink-writer-interface">Sink Writer Interface</h2> + +<p><a href="https://github.com/apache/flink/blob/release-1.15.0/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java">AsyncSinkWriter</a></p> + +<p>There is a buffer in the sink writer that holds the request entries that have been sent to the sink but not yet written to the destination. An element of the buffer is a <code>RequestEntryWrapper&lt;RequestEntryT&gt;</code> consisting of the <code>RequestEntryT</code> along with the size of that record.</p> + +<div class="highlight"><pre><code class="language-java"><span class="kd">public</span> <span class="kd">abstract</span> <span class="kd">class</span> <span class="nc">AsyncSinkWriter</span><span class="o">&lt;</span><span class="n">InputT</span><span class="o">,</span> <span class="n&quo [...] + <span class="kd">implements</span> <span class="n">StatefulSink</span><span class="o">.</span><span class="na">StatefulSinkWriter</span><span class="o">&lt;</span><span class="n">InputT</span><span class="o">,</span> <span class="n">BufferedRequestState</span><span class="o">&lt;< [...] + <span class="c1">// ...</span> + <span class="kd">protected</span> <span class="kd">abstract</span> <span class="kt">void</span> <span class="nf">submitRequestEntries</span><span class="o">(</span> + <span class="n">List</span><span class="o">&lt;</span><span class="n">RequestEntryT</span><span class="o">&gt;</span> <span class="n">requestEntries</span><span class="o">,</span> <span class="n">Consumer</span><span class="o">&lt;</span><span class="n">List</span><s [...] + <span class="c1">// ...</span> +<span class="o">}</span></code></pre></div> + +<p>We will submit the <code>requestEntries</code> asynchronously to the destination from here. Sink implementers should use the client libraries of the destination they intend to write to, to perform this.</p> + +<p>Should any elements fail to be persisted, they will be requeued back in the buffer for retry using <code>requestResult.accept(...list of failed entries...)</code>. However, retrying any element that is known to be faulty and consistently failing, will result in that element being requeued forever, therefore a sensible strategy for determining what should be retried is highly recommended. If no errors were returned, we must indicate this with <code>requestResult [...] + +<p>If at any point, it is determined that a fatal error has occurred and that we should throw a runtime exception from the sink, we can call <code>getFatalExceptionCons().accept(...);</code> from anywhere in the concrete sink writer.</p> + +<div class="highlight"><pre><code class="language-java"><span class="kd">public</span> <span class="kd">abstract</span> <span class="kd">class</span> <span class="nc">AsyncSinkWriter</span><span class="o">&lt;</span><span class="n">InputT</span><span class="o">,</span> <span class="n&quo [...] + <span class="kd">implements</span> <span class="n">StatefulSink</span><span class="o">.</span><span class="na">StatefulSinkWriter</span><span class="o">&lt;</span><span class="n">InputT</span><span class="o">,</span> <span class="n">BufferedRequestState</span><span class="o">&lt;< [...] + <span class="c1">// ...</span> + <span class="kd">protected</span> <span class="kd">abstract</span> <span class="kt">long</span> <span class="nf">getSizeInBytes</span><span class="o">(</span><span class="n">RequestEntryT</span> <span class="n">requestEntry</span><span class="o">);</span> + <span class="c1">// ...</span> +<span class="o">}</span></code></pre></div> +<p>The async sink has a concept of size of elements in the buffer. This allows users to specify a byte size threshold beyond which elements will be flushed. However the sink implementer is best positioned to determine what the most sensible measure of size for each <code>RequestEntryT</code> is. If there is no way to determine the size of a record, then the value <code>0</code> may be returned, and the sink will not flush based on record size triggers.</p> + +<div class="highlight"><pre><code class="language-java"><span class="kd">public</span> <span class="kd">abstract</span> <span class="kd">class</span> <span class="nc">AsyncSinkWriter</span><span class="o">&lt;</span><span class="n">InputT</span><span class="o">,</span> <span class="n&quo [...] + <span class="kd">implements</span> <span class="n">StatefulSink</span><span class="o">.</span><span class="na">StatefulSinkWriter</span><span class="o">&lt;</span><span class="n">InputT</span><span class="o">,</span> <span class="n">BufferedRequestState</span><span class="o">&lt;< [...] + <span class="c1">// ...</span> + <span class="kd">public</span> <span class="nf">AsyncSinkWriter</span><span class="o">(</span> + <span class="n">ElementConverter</span><span class="o">&lt;</span><span class="n">InputT</span><span class="o">,</span> <span class="n">RequestEntryT</span><span class="o">&gt;</span> <span class="n">elementConverter</span><span class="o">,</span> + <span class="n">Sink</span><span class="o">.</span><span class="na">InitContext</span> <span class="n">context</span><span class="o">,</span> + <span class="kt">int</span> <span class="n">maxBatchSize</span><span class="o">,</span> + <span class="kt">int</span> <span class="n">maxInFlightRequests</span><span class="o">,</span> + <span class="kt">int</span> <span class="n">maxBufferedRequests</span><span class="o">,</span> + <span class="kt">long</span> <span class="n">maxBatchSizeInBytes</span><span class="o">,</span> + <span class="kt">long</span> <span class="n">maxTimeInBufferMS</span><span class="o">,</span> + <span class="kt">long</span> <span class="n">maxRecordSizeInBytes</span><span class="o">)</span> <span class="o">{</span> <span class="cm">/* ... */</span> <span class="o">}</span> + <span class="c1">// ...</span> +<span class="o">}</span></code></pre></div> + +<p>By default, the method <code>snapshotState</code> returns all the elements in the buffer to be saved for snapshots. Any elements that were previously removed from the buffer are guaranteed to be persisted in the destination by a preceding call to <code>AsyncWriter#flush(true)</code>. +You may want to save additional state from the concrete sink. You can achieve this by overriding <code>snapshotState</code>, and restoring from the saved state in the constructor. You will receive the saved state by overriding <code>restoreWriter</code> in your concrete sink. In this method, you should construct a sink writer, passing in the recovered state.</p> + +<div class="highlight"><pre><code class="language-java"><span class="kd">class</span> <span class="nc">MySinkWriter</span><span class="o">&lt;</span><span class="n">InputT</span><span class="o">&gt;</span> <span class="kd">extends</span> <span class="n">AsyncSinkWriter</span><span class=& [...] + + <span class="n">MySinkWriter</span><span class="o">(</span> + <span class="c1">// ... </span> + <span class="n">Collection</span><span class="o">&lt;</span><span class="n">BufferedRequestState</span><span class="o">&lt;</span><span class="n">Record</span><span class="o">&gt;&gt;</span> <span class="n">initialStates</span><span class="o">)</span> <span class="o">{</ [...] + <span class="kd">super</span><span class="o">(</span> + <span class="c1">// ...</span> + <span class="n">initialStates</span><span class="o">);</span> + <span class="c1">// restore concrete sink state from initialStates</span> + <span class="o">}</span> + + <span class="nd">@Override</span> + <span class="kd">public</span> <span class="n">List</span><span class="o">&lt;</span><span class="n">BufferedRequestState</span><span class="o">&lt;</span><span class="n">RequestEntryT</span><span class="o">&gt;&gt;</span> <span class="nf">snapshotState</span><span class="o">(< [...] + <span class="kd">super</span><span class="o">.</span><span class="na">snapshotState</span><span class="o">(</span><span class="n">checkpointId</span><span class="o">);</span> + <span class="c1">// ...</span> + <span class="o">}</span> + +<span class="o">}</span></code></pre></div> + +<h2 id="sink-interface">Sink Interface</h2> + +<p><a href="https://github.com/apache/flink/blob/release-1.15.0/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/AsyncSinkBase.java">AsyncSinkBase</a></p> + +<div class="highlight"><pre><code class="language-java"><span class="kd">class</span> <span class="nc">MySink</span><span class="o">&lt;</span><span class="n">InputT</span><span class="o">&gt;</span> <span class="kd">extends</span> <span class="n">AsyncSinkBase</span><span class="o&q [...] + <span class="c1">// ...</span> + <span class="nd">@Override</span> + <span class="kd">public</span> <span class="n">StatefulSinkWriter</span><span class="o">&lt;</span><span class="n">InputT</span><span class="o">,</span> <span class="n">BufferedRequestState</span><span class="o">&lt;</span><span class="n">RequestEntryT</span><span class="o">&gt;& [...] + <span class="k">return</span> <span class="k">new</span> <span class="nf">MySinkWriter</span><span class="o">(</span><span class="n">context</span><span class="o">);</span> + <span class="o">}</span> + <span class="c1">// ...</span> +<span class="o">}</span></code></pre></div> +<p>AsyncSinkBase implementations return their own extension of the <code>AsyncSinkWriter</code> from <code>createWriter()</code>.</p> + +<p>At the time of writing, the <a href="https://github.com/apache/flink/tree/release-1.15.0/flink-connectors/flink-connector-aws-kinesis-streams">Kinesis Data Streams sink</a> and <a href="https://github.com/apache/flink/tree/release-1.15.0/flink-connectors/flink-connector-aws-kinesis-firehose">Kinesis Data Firehose sink</a> are using this base sink.</p> + +<h1 id="metrics">Metrics</h1> + +<p>There are three metrics that automatically exist when you implement sinks (and, thus, should not be implemented by yourself).</p> + +<ul> + <li>CurrentSendTime Gauge - returns the amount of time in milliseconds it took for the most recent request to write records to complete, whether successful or not.</li> + <li>NumBytesOut Counter - counts the total number of bytes the sink has tried to write to the destination, using the method <code>getSizeInBytes</code> to determine the size of each record. This will double count failures that may need to be retried.</li> + <li>NumRecordsOut Counter - similar to above, this counts the total number of records the sink has tried to write to the destination. This will double count failures that may need to be retried.</li> +</ul> + +<h1 id="sink-behavior">Sink Behavior</h1> + +<p>There are six sink configuration settings that control the buffering, flushing, and retry behavior of the sink.</p> + +<ul> + <li><code>int maxBatchSize</code> - maximum number of elements that may be passed in the list to submitRequestEntries to be written downstream.</li> + <li><code>int maxInFlightRequests</code> - maximum number of uncompleted calls to submitRequestEntries that the SinkWriter will allow at any given point. Once this point has reached, writes and callbacks to add elements to the buffer may block until one or more requests to submitRequestEntries completes.</li> + <li><code>int maxBufferedRequests</code> - maximum buffer length. Callbacks to add elements to the buffer and calls to write will block if this length has been reached and will only unblock if elements from the buffer have been removed for flushing.</li> + <li><code>long maxBatchSizeInBytes</code> - a flush will be attempted if the most recent call to write introduces an element to the buffer such that the total size of the buffer is greater than or equal to this threshold value.</li> + <li><code>long maxTimeInBufferMS</code> - maximum amount of time an element may remain in the buffer. In most cases elements are flushed as a result of the batch size (in bytes or number) being reached or during a snapshot. However, there are scenarios where an element may remain in the buffer forever or a long period of time. To mitigate this, a timer is constantly active in the buffer such that: while the buffer is not empty, it will flush every maxTimeInBufferMS mi [...] + <li><code>long maxRecordSizeInBytes</code> - maximum size in bytes allowed for a single record, as determined by <code>getSizeInBytes()</code>.</li> +</ul> + +<p>Destinations typically have a defined throughput limit and will begin throttling or rejecting requests once near. We employ <a href="https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease">Additive Increase Multiplicative Decrease (AIMD)</a> as a strategy for selecting the optimal batch size.</p> + +<h1 id="summary">Summary</h1> + +<p>The AsyncSinkBase is a new abstraction that makes creating and maintaining async sinks easier. This will be available in Flink 1.15 and we hope that you will try it out and give us feedback on it.</p> +</description> +<pubDate>Fri, 06 May 2022 18:00:00 +0200</pubDate> +<link>https://flink.apache.org/2022/05/06/async-sink-base.html</link> +<guid isPermaLink="true">/2022/05/06/async-sink-base.html</guid> +</item> + <item> <title>Exploring the thread mode in PyFlink</title> <description><p>PyFlink was introduced in Flink 1.9 which purpose is to bring the power of Flink to Python users and allow Python users to develop Flink jobs in Python language. @@ -20313,369 +20486,5 @@ The website implements a streaming application that detects a pattern on the str <guid isPermaLink="true">/2019/06/26/broadcast-state.html</guid> </item> -<item> -<title>A Deep-Dive into Flink's Network Stack</title> -<description><style type="text/css"> -.tg {border-collapse:collapse;border-spacing:0;} -.tg td{padding:10px 20px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;} -.tg th{padding:10px 20px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;background-color:#eff0f1;} -.tg .tg-wide{padding:10px 30px;} -.tg .tg-top{vertical-align:top} -.tg .tg-center{text-align:center;vertical-align:center} -</style> - -<p>Flink’s network stack is one of the core components that make up the <code>flink-runtime</code> module and sit at the heart of every Flink job. It connects individual work units (subtasks) from all TaskManagers. This is where your streamed-in data flows through and it is therefore crucial to the performance of your Flink job for both the throughput as well as latency you observe. In contrast to the coordination channels between TaskManagers and JobManagers which are [...] - -<p>This blog post is the first in a series of posts about the network stack. In the sections below, we will first have a high-level look at what abstractions are exposed to the stream operators and then go into detail on the physical implementation and various optimisations Flink did. We will briefly present the result of these optimisations and Flink’s trade-off between throughput and latency. Future blog posts in this series will elaborate more on monitoring and metrics, tuning p [...] - -<div class="page-toc"> -<ul id="markdown-toc"> - <li><a href="#logical-view" id="markdown-toc-logical-view">Logical View</a></li> - <li><a href="#physical-transport" id="markdown-toc-physical-transport">Physical Transport</a> <ul> - <li><a href="#inflicting-backpressure-1" id="markdown-toc-inflicting-backpressure-1">Inflicting Backpressure (1)</a></li> - </ul> - </li> - <li><a href="#credit-based-flow-control" id="markdown-toc-credit-based-flow-control">Credit-based Flow Control</a> <ul> - <li><a href="#inflicting-backpressure-2" id="markdown-toc-inflicting-backpressure-2">Inflicting Backpressure (2)</a></li> - <li><a href="#what-do-we-gain-where-is-the-catch" id="markdown-toc-what-do-we-gain-where-is-the-catch">What do we Gain? Where is the Catch?</a></li> - </ul> - </li> - <li><a href="#writing-records-into-network-buffers-and-reading-them-again" id="markdown-toc-writing-records-into-network-buffers-and-reading-them-again">Writing Records into Network Buffers and Reading them again</a> <ul> - <li><a href="#flushing-buffers-to-netty" id="markdown-toc-flushing-buffers-to-netty">Flushing Buffers to Netty</a></li> - <li><a href="#buffer-builder--buffer-consumer" id="markdown-toc-buffer-builder--buffer-consumer">Buffer Builder &amp; Buffer Consumer</a></li> - </ul> - </li> - <li><a href="#latency-vs-throughput" id="markdown-toc-latency-vs-throughput">Latency vs. Throughput</a></li> - <li><a href="#conclusion" id="markdown-toc-conclusion">Conclusion</a></li> -</ul> - -</div> - -<h2 id="logical-view">Logical View</h2> - -<p>Flink’s network stack provides the following logical view to the subtasks when communicating with each other, for example during a network shuffle as required by a <code>keyBy()</code>.</p> - -<center> -<img src="/img/blog/2019-06-05-network-stack/flink-network-stack1.png" width="400px" alt="Logical View on Flink's Network Stack" /> -</center> -<p><br /></p> - -<p>It abstracts over the different settings of the following three concepts:</p> - -<ul> - <li>Subtask output type (<code>ResultPartitionType</code>): - <ul> - <li><strong>pipelined (bounded or unbounded):</strong> - Sending data downstream as soon as it is produced, potentially one-by-one, either as a bounded or unbounded stream of records.</li> - <li><strong>blocking:</strong> - Sending data downstream only when the full result was produced.</li> - </ul> - </li> - <li>Scheduling type: - <ul> - <li><strong>all at once (eager):</strong> - Deploy all subtasks of the job at the same time (for streaming applications).</li> - <li><strong>next stage on first output (lazy):</strong> - Deploy downstream tasks as soon as any of their producers generated output.</li> - <li><strong>next stage on complete output:</strong> - Deploy downstream tasks when any or all of their producers have generated their full output set.</li> - </ul> - </li> - <li>Transport: - <ul> - <li><strong>high throughput:</strong> - Instead of sending each record one-by-one, Flink buffers a bunch of records into its network buffers and sends them altogether. This reduces the costs per record and leads to higher throughput.</li> - <li><strong>low latency via buffer timeout:</strong> - By reducing the timeout of sending an incompletely filled buffer, you may sacrifice throughput for latency.</li> - </ul> - </li> -</ul> - -<p>We will have a look at the throughput and low latency optimisations in the sections below which look at the physical layers of the network stack. For this part, let us elaborate a bit more on the output and scheduling types. First of all, it is important to know that the subtask output type and the scheduling type are closely intertwined making only specific combinations of the two valid.</p> - -<p>Pipelined result partitions are streaming-style outputs which need a live target subtask to send data to. The target can be scheduled before results are produced or at first output. Batch jobs produce bounded result partitions while streaming jobs produce unbounded results.</p> - -<p>Batch jobs may also produce results in a blocking fashion, depending on the operator and connection pattern that is used. In that case, the complete result must be produced first before the receiving task can be scheduled. This allows batch jobs to work more efficiently and with lower resource usage.</p> - -<p>The following table summarises the valid combinations: -<br /></p> -<center> -<table class="tg"> - <tr> - <th>Output Type</th> - <th>Scheduling Type</th> - <th>Applies to…</th> - </tr> - <tr> - <td rowspan="2">pipelined, unbounded</td> - <td>all at once</td> - <td>Streaming jobs</td> - </tr> - <tr> - <td>next stage on first output</td> - <td>n/a¹</td> - </tr> - <tr> - <td rowspan="2">pipelined, bounded</td> - <td>all at once</td> - <td>n/a²</td> - </tr> - <tr> - <td>next stage on first output</td> - <td>Batch jobs</td> - </tr> - <tr> - <td>blocking</td> - <td>next stage on complete output</td> - <td>Batch jobs</td> - </tr> -</table> -</center> -<p><br /></p> - -<p><sup>1</sup> Currently not used by Flink. <br /> -<sup>2</sup> This may become applicable to streaming jobs once the <a href="/roadmap.html#batch-and-streaming-unification">Batch/Streaming unification</a> is done.</p> - -<p><br /> -Additionally, for subtasks with more than one input, scheduling start in two ways: after <em>all</em> or after <em>any</em> input producers to have produced a record/their complete dataset. For tuning the output types and scheduling decisions in batch jobs, please have a look at <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setExecutionMode-org.apache.flink.api.common.ExecutionM [...] - -<p><br /></p> - -<h2 id="physical-transport">Physical Transport</h2> - -<p>In order to understand the physical data connections, please recall that, in Flink, different tasks may share the same slot via <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/dev/stream/operators/#task-chaining-and-resource-groups">slot sharing groups</a>. TaskManagers may also provide more than one slot to allow multiple subtasks of the same task to be scheduled onto the same TaskManager.</p> - -<p>For the example pictured below, we will assume a parallelism of 4 and a deployment with two task managers offering 2 slots each. TaskManager 1 executes subtasks A.1, A.2, B.1, and B.2 and TaskManager 2 executes subtasks A.3, A.4, B.3, and B.4. In a shuffle-type connection between task A and task B, for example from a <code>keyBy()</code>, there are 2x4 logical connections to handle on each TaskManager, some of which are local, some remote: -<br /></p> - -<center> -<table class="tg"> - <tr> - <th></th> - <th class="tg-wide">B.1</th> - <th class="tg-wide">B.2</th> - <th class="tg-wide">B.3</th> - <th class="tg-wide">B.4</th> - </tr> - <tr> - <th class="tg-wide">A.1</th> - <td class="tg-center" colspan="2" rowspan="2">local</td> - <td class="tg-center" colspan="2" rowspan="2">remote</td> - </tr> - <tr> - <th class="tg-wide">A.2</th> - </tr> - <tr> - <th class="tg-wide">A.3</th> - <td class="tg-center" colspan="2" rowspan="2">remote</td> - <td class="tg-center" colspan="2" rowspan="2">local</td> - </tr> - <tr> - <th class="tg-wide">A.4</th> - </tr> -</table> -</center> - -<p><br /></p> - -<p>Each (remote) network connection between different tasks will get its own TCP channel in Flink’s network stack. However, if different subtasks of the same task are scheduled onto the same TaskManager, their network connections towards the same TaskManagers will be multiplexed and share a single TCP channel for reduced resource usage. In our example, this would apply to A.1 → B.3, A.1 → B.4, as well as A.2 → B.3, and A.2 → B.4 as pictured below: -<br /></p> - -<center> -<img src="/img/blog/2019-06-05-network-stack/flink-network-stack2.png" width="700px" alt="Physical-transport-Flink's Network Stack" /> -</center> -<p><br /></p> - -<p>The results of each subtask are called <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/partition/ResultPartition.html">ResultPartition</a>, each split into separate <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/partition/ResultSubpartition.html">ResultSubpartitions</a> — one for each logical channel. At this [...] - -<div class="highlight"><pre><code>#channels * buffers-per-channel + floating-buffers-per-gate -</code></pre></div> - -<p>The total number of buffers on a single TaskManager usually does not need configuration. See the <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/ops/config.html#configuring-the-network-buffers">Configuring the Network Buffers</a> documentation for details on how to do so if needed.</p> - -<h3 id="inflicting-backpressure-1">Inflicting Backpressure (1)</h3> - -<p>Whenever a subtask’s sending buffer pool is exhausted — buffers reside in either a result subpartition’s buffer queue or inside the lower, Netty-backed network stack — the producer is blocked, cannot continue, and experiences backpressure. The receiver works in a similar fashion: any incoming Netty buffer in the lower network stack needs to be made available to Flink via a network buffer. If there is no network buffer available in the appropriate subtask’s buffer pool, Flink wil [...] - -<p><br /></p> - -<center> -<img src="/img/blog/2019-06-05-network-stack/flink-network-stack3.png" width="700px" alt="Physical-transport-backpressure-Flink's Network Stack" /> -</center> -<p><br /></p> - -<p>To prevent this situation from even happening, Flink 1.5 introduced its own flow control mechanism.</p> - -<p><br /></p> - -<h2 id="credit-based-flow-control">Credit-based Flow Control</h2> - -<p>Credit-based flow control makes sure that whatever is “on the wire” will have capacity at the receiver to handle. It is based on the availability of network buffers as a natural extension of the mechanisms Flink had before. Instead of only having a shared local buffer pool, each remote input channel now has its own set of <strong>exclusive buffers</strong>. Conversely, buffers in the local buffer pool are called <strong>floating buffers</strong> as they w [...] - -<p>Receivers will announce the availability of buffers as <strong>credits</strong> to the sender (1 buffer = 1 credit). Each result subpartition will keep track of its <strong>channel credits</strong>. Buffers are only forwarded to the lower network stack if credit is available and each sent buffer reduces the credit score by one. In addition to the buffers, we also send information about the current <strong>backlog</strong> size which specifies [...] -<br /></p> - -<center> -<img src="/img/blog/2019-06-05-network-stack/flink-network-stack4.png" width="700px" alt="Physical-transport-credit-flow-Flink's Network Stack" /> -</center> -<p><br /></p> - -<p>Credit-based flow control will use <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-buffers-per-channel">buffers-per-channel</a> to specify how many buffers are exclusive (mandatory) and <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-floating-buffers-per-gate">floating-buffers-per-gate</a> for the local buffer poo [...] -<br /></p> - -<p><sup>3</sup>If there are not enough buffers available, each buffer pool will get the same share of the globally available ones (± 1).</p> - -<h3 id="inflicting-backpressure-2">Inflicting Backpressure (2)</h3> - -<p>As opposed to the receiver’s backpressure mechanisms without flow control, credits provide a more direct control: If a receiver cannot keep up, its available credits will eventually hit 0 and stop the sender from forwarding buffers to the lower network stack. There is backpressure on this logical channel only and there is no need to block reading from a multiplexed TCP channel. Other receivers are therefore not affected in processing available buffers.</p> - -<h3 id="what-do-we-gain-where-is-the-catch">What do we Gain? Where is the Catch?</h3> - -<p><img align="right" src="/img/blog/2019-06-05-network-stack/flink-network-stack5.png" width="300" height="200" alt="Physical-transport-credit-flow-checkpoints-Flink's Network Stack" /></p> - -<p>Since, with flow control, a channel in a multiplex cannot block another of its logical channels, the overall resource utilisation should increase. In addition, by having full control over how much data is “on the wire”, we are also able to improve <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/internals/stream_checkpointing.html#checkpointing">checkpoint alignments</a>: without flow control, it would take a while for the channel to fill [...] - -<p>However, the additional announce messages from the receiver may come at some additional costs, especially in setup using SSL-encrypted channels. Also, a single input channel cannot make use of all buffers in the buffer pool because exclusive buffers are not shared. It can also not start right away with sending as much data as is available so that during ramp-up (if you are producing data faster than announcing credits in return) it may take longer to send data through. While thi [...] - -<p>There is one more thing you may notice when using credit-based flow control: since we buffer less data between the sender and receiver, you may experience backpressure earlier. This is, however, desired and you do not really get any advantage by buffering more data. If you want to buffer more but keep flow control, you could consider increasing the number of floating buffers via <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/ops/config.html#taskmanage [...] - -<p><br /></p> - -<center> -<table class="tg"> - <tr> - <th>Advantages</th> - <th>Disadvantages</th> - </tr> - <tr> - <td class="tg-top"> - • better resource utilisation with data skew in multiplexed connections <br /><br /> - • improved checkpoint alignment<br /><br /> - • reduced memory use (less data in lower network layers)</td> - <td class="tg-top"> - • additional credit-announce messages<br /><br /> - • additional backlog-announce messages (piggy-backed with buffer messages, almost no overhead)<br /><br /> - • potential round-trip latency</td> - </tr> - <tr> - <td class="tg-center" colspan="2">• backpressure appears earlier</td> - </tr> -</table> -</center> -<p><br /></p> - -<div class="alert alert-info"> - <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span> -If you need to turn off credit-based flow control, you can add this to your <code>flink-conf.yaml</code>:</p> - - <p><code>taskmanager.network.credit-model: false</code></p> - - <p>This parameter, however, is deprecated and will eventually be removed along with the non-credit-based flow control code.</p> -</div> - -<p><br /></p> - -<h2 id="writing-records-into-network-buffers-and-reading-them-again">Writing Records into Network Buffers and Reading them again</h2> - -<p>The following picture extends the slightly more high-level view from above with further details of the network stack and its surrounding components, from the collection of a record in your sending operator to the receiving operator getting it: -<br /></p> - -<center> -<img src="/img/blog/2019-06-05-network-stack/flink-network-stack6.png" width="700px" alt="Physical-transport-complete-Flink's Network Stack" /> -</center> -<p><br /></p> - -<p>After creating a record and passing it along, for example via <code>Collector#collect()</code>, it is given to the <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.html">RecordWriter</a> which serialises the record from a Java object into a sequence of bytes which eventually ends up in a network buffer that is handed along as described above. The RecordWriter [...] - -<p>On the receiver’s side, the lower network stack (netty) is writing received buffers into the appropriate input channels. The (stream) tasks’s thread eventually reads from these queues and tries to deserialise the accumulated bytes into Java objects with the help of the <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/reader/RecordReader.html">RecordReader</a> and going through the <a href [...] -<br /></p> - -<h3 id="flushing-buffers-to-netty">Flushing Buffers to Netty</h3> - -<p>In the picture above, the credit-based flow control mechanics actually sit inside the “Netty Server” (and “Netty Client”) components and the buffer the RecordWriter is writing to is always added to the result subpartition in an empty state and then gradually filled with (serialised) records. But when does Netty actually get the buffer? Obviously, it cannot take bytes whenever they become available since that would not only add substantial costs due to cross-thread communication [...] - -<p>In Flink, there are three situations that make a buffer available for consumption by the Netty server:</p> - -<ul> - <li>a buffer becomes full when writing a record to it, or<br /></li> - <li>the buffer timeout hits, or<br /></li> - <li>a special event such as a checkpoint barrier is sent.<br /> -<br /></li> -</ul> - -<h4 id="flush-after-buffer-full">Flush after Buffer Full</h4> - -<p>The RecordWriter works with a local serialisation buffer for the current record and will gradually write these bytes to one or more network buffers sitting at the appropriate result subpartition queue. Although a RecordWriter can work on multiple subpartitions, each subpartition has only one RecordWriter writing data to it. The Netty server, on the other hand, is reading from multiple result subpartitions and multiplexing the appropriate ones into a single channel as described a [...] -<br /></p> - -<center> -<img src="/img/blog/2019-06-05-network-stack/flink-network-stack7.png" width="500px" alt="Record-writer-to-network-Flink's Network Stack" /> -</center> -<p><br /></p> - -<p><sup>4</sup>We can assume it already got the notification if there are more finished buffers in the queue. -<br /></p> - -<h4 id="flush-after-buffer-timeout">Flush after Buffer Timeout</h4> - -<p>In order to support low-latency use cases, we cannot only rely on buffers being full in order to send data downstream. There may be cases where a certain communication channel does not have too many records flowing through and unnecessarily increase the latency of the few records you actually have. Therefore, a periodic process will flush whatever data is available down the stack: the output flusher. The periodic interval can be configured via <a href="https://nightlies. [...] -<br /></p> - -<center> -<img src="/img/blog/2019-06-05-network-stack/flink-network-stack8.png" width="500px" alt="Record-writer-to-network-with-flusher-Flink's Network Stack" /> -</center> -<p><br /></p> - -<p><sup>5</sup>Strictly speaking, the output flusher does not give any guarantees - it only sends a notification to Netty which can pick it up at will / capacity. This also means that the output flusher has no effect if the channel is backpressured. -<br /></p> - -<h4 id="flush-after-special-event">Flush after special event</h4> - -<p>Some special events also trigger immediate flushes if being sent through the RecordWriter. The most important ones are checkpoint barriers or end-of-partition events which obviously should go quickly and not wait for the output flusher to kick in. -<br /></p> - -<h4 id="further-remarks">Further remarks</h4> - -<p>In contrast to Flink &lt; 1.5, please note that (a) network buffers are now placed in the subpartition queues directly and (b) we are not closing the buffer on each flush. This gives us a few advantages:</p> - -<ul> - <li>less synchronisation overhead (output flusher and RecordWriter are independent)</li> - <li>in high-load scenarios where Netty is the bottleneck (either through backpressure or directly), we can still accumulate data in incomplete buffers</li> - <li>significant reduction of Netty notifications</li> -</ul> - -<p>However, you may notice an increased CPU use and TCP packet rate during low load scenarios. This is because, with the changes, Flink will use any <em>available</em> CPU cycles to try to maintain the desired latency. Once the load increases, this will self-adjust by buffers filling up more. High load scenarios are not affected and even get a better throughput because of the reduced synchronisation overhead. -<br /></p> - -<h3 id="buffer-builder--buffer-consumer">Buffer Builder &amp; Buffer Consumer</h3> - -<p>If you want to dig deeper into how the producer-consumer mechanics are implemented in Flink, please take a closer look at the <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferBuilder.html">BufferBuilder</a> and <a href="https://nightlies.apache.org/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferConsumer.html">BufferConsumer< [...] - -<p><br /></p> - -<h2 id="latency-vs-throughput">Latency vs. Throughput</h2> - -<p>Network buffers were introduced to get higher resource utilisation and higher throughput at the cost of having some records wait in buffers a little longer. Although an upper limit to this wait time can be given via the buffer timeout, you may be curious to find out more about the trade-off between these two dimensions: latency and throughput, as, obviously, you cannot get both. The following plot shows various values for the buffer timeout starting at 0 (flush with every record [...] -<br /></p> - -<center> -<img src="/img/blog/2019-06-05-network-stack/flink-network-stack9.png" width="650px" alt="Network-buffertimeout-Flink's Network Stack" /> -</center> -<p><br /></p> - -<p>As you can see, with Flink 1.5+, even very low buffer timeouts such as 1ms (for low-latency scenarios) provide a maximum throughput as high as 75% of the default timeout where more data is buffered before being sent over the wire.</p> - -<p><br /></p> - -<h2 id="conclusion">Conclusion</h2> - -<p>Now you know about result partitions, the different network connections and scheduling types for both batch and streaming. You also know about credit-based flow control and how the network stack works internally, in order to reason about network-related tuning parameters and about certain job behaviours. Future blog posts in this series will build upon this knowledge and go into more operational details including relevant metrics to look at, further network stack tuning, and com [...] - -</description> -<pubDate>Wed, 05 Jun 2019 10:45:00 +0200</pubDate> -<link>https://flink.apache.org/2019/06/05/flink-network-stack.html</link> -<guid isPermaLink="true">/2019/06/05/flink-network-stack.html</guid> -</item> - </channel> </rss> diff --git a/content/blog/index.html b/content/blog/index.html index ec5cb1d31..601620a6b 100644 --- a/content/blog/index.html +++ b/content/blog/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></h2> + + <p>06 May 2022 + Zichen Liu </p> + + <p>An overview of the new AsyncBaseSink and how to use it for building your own concrete sink</p> + + <p><a href="/2022/05/06/async-sink-base.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></h2> @@ -360,19 +373,6 @@ This new release brings various improvements to the StateFun runtime, a leaner w <hr> - <article> - <h2 class="blog-title"><a href="/news/2022/01/17/release-1.14.3.html">Apache Flink 1.14.3 Release Announcement</a></h2> - - <p>17 Jan 2022 - Thomas Weise (<a href="https://twitter.com/thweise">@thweise</a>) & Martijn Visser (<a href="https://twitter.com/martijnvisser82">@martijnvisser82</a>)</p> - - <p>The Apache Flink community released the second bugfix version of the Apache Flink 1.14 series.</p> - - <p><a href="/news/2022/01/17/release-1.14.3.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -405,6 +405,16 @@ This new release brings various improvements to the StateFun runtime, a leaner w <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page10/index.html b/content/blog/page10/index.html index d1c1c8a8c..56cf8e663 100644 --- a/content/blog/page10/index.html +++ b/content/blog/page10/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2019/12/09/flink-kubernetes-kudo.html">Running Apache Flink on Kubernetes with KUDO</a></h2> + + <p>09 Dec 2019 + Gerred Dillon </p> + + <p>A common use case for Apache Flink is streaming data analytics together with Apache Kafka, which provides a pub/sub model and durability for data streams. In this post, we demonstrate how to orchestrate Flink and Kafka with KUDO.</p> + + <p><a href="/news/2019/12/09/flink-kubernetes-kudo.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2019/11/25/query-pulsar-streams-using-apache-flink.html">How to query Pulsar Streams using Apache Flink</a></h2> @@ -358,19 +371,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/2019/06/05/flink-network-stack.html">A Deep-Dive into Flink's Network Stack</a></h2> - - <p>05 Jun 2019 - Nico Kruber </p> - - <p>Flink’s network stack is one of the core components that make up Apache Flink's runtime module sitting at the core of every Flink job. In this post, which is the first in a series of posts about the network stack, we look at the abstractions exposed to the stream operators and detail their physical implementation and various optimisations in Apache Flink.</p> - - <p><a href="/2019/06/05/flink-network-stack.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -403,6 +403,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page11/index.html b/content/blog/page11/index.html index 366c8b982..94d9b5ff1 100644 --- a/content/blog/page11/index.html +++ b/content/blog/page11/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/2019/06/05/flink-network-stack.html">A Deep-Dive into Flink's Network Stack</a></h2> + + <p>05 Jun 2019 + Nico Kruber </p> + + <p>Flink’s network stack is one of the core components that make up Apache Flink's runtime module sitting at the core of every Flink job. In this post, which is the first in a series of posts about the network stack, we look at the abstractions exposed to the stream operators and detail their physical implementation and various optimisations in Apache Flink.</p> + + <p><a href="/2019/06/05/flink-network-stack.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/2019/05/19/state-ttl.html">State TTL in Flink 1.8.0: How to Automatically Cleanup Application State in Apache Flink</a></h2> @@ -359,21 +372,6 @@ for more details.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2019/02/15/release-1.7.2.html">Apache Flink 1.7.2 Released</a></h2> - - <p>15 Feb 2019 - </p> - - <p><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.7 series.</p> - -</p> - - <p><a href="/news/2019/02/15/release-1.7.2.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -406,6 +404,16 @@ for more details.</p> <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page12/index.html b/content/blog/page12/index.html index 20cb6295e..c04e8fd90 100644 --- a/content/blog/page12/index.html +++ b/content/blog/page12/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2019/02/15/release-1.7.2.html">Apache Flink 1.7.2 Released</a></h2> + + <p>15 Feb 2019 + </p> + + <p><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.7 series.</p> + +</p> + + <p><a href="/news/2019/02/15/release-1.7.2.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2019/02/13/unified-batch-streaming-blink.html">Batch as a Special Case of Streaming and Alibaba's contribution of Blink</a></h2> @@ -367,21 +382,6 @@ Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa <hr> - <article> - <h2 class="blog-title"><a href="/news/2018/08/21/release-1.5.3.html">Apache Flink 1.5.3 Released</a></h2> - - <p>21 Aug 2018 - </p> - - <p><p>The Apache Flink community released the third bugfix version of the Apache Flink 1.5 series.</p> - -</p> - - <p><a href="/news/2018/08/21/release-1.5.3.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -414,6 +414,16 @@ Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page13/index.html b/content/blog/page13/index.html index f71295840..50b0e23f6 100644 --- a/content/blog/page13/index.html +++ b/content/blog/page13/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2018/08/21/release-1.5.3.html">Apache Flink 1.5.3 Released</a></h2> + + <p>21 Aug 2018 + </p> + + <p><p>The Apache Flink community released the third bugfix version of the Apache Flink 1.5 series.</p> + +</p> + + <p><a href="/news/2018/08/21/release-1.5.3.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2018/08/09/release-1.6.0.html">Apache Flink 1.6.0 Release Announcement</a></h2> @@ -363,19 +378,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2017/12/21/2017-year-in-review.html">Apache Flink in 2017: Year in Review</a></h2> - - <p>21 Dec 2017 - Chris Ward (<a href="https://twitter.com/chrischinch">@chrischinch</a>) & Mike Winters (<a href="https://twitter.com/wints">@wints</a>)</p> - - <p>As 2017 comes to a close, let's take a moment to look back on the Flink community's great work during the past year.</p> - - <p><a href="/news/2017/12/21/2017-year-in-review.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -408,6 +410,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page14/index.html b/content/blog/page14/index.html index 7a363eef7..0215cada1 100644 --- a/content/blog/page14/index.html +++ b/content/blog/page14/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2017/12/21/2017-year-in-review.html">Apache Flink in 2017: Year in Review</a></h2> + + <p>21 Dec 2017 + Chris Ward (<a href="https://twitter.com/chrischinch">@chrischinch</a>) & Mike Winters (<a href="https://twitter.com/wints">@wints</a>)</p> + + <p>As 2017 comes to a close, let's take a moment to look back on the Flink community's great work during the past year.</p> + + <p><a href="/news/2017/12/21/2017-year-in-review.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2017/12/12/release-1.4.0.html">Apache Flink 1.4.0 Release Announcement</a></h2> @@ -369,19 +382,6 @@ what’s coming in Flink 1.4.0 as well as a preview of what the Flink community <hr> - <article> - <h2 class="blog-title"><a href="/news/2017/03/29/table-sql-api-update.html">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a></h2> - - <p>29 Mar 2017 by Timo Walther (<a href="https://twitter.com/">@twalthr</a>) - </p> - - <p><p>Broadening the user base and unifying batch & streaming with relational APIs</p></p> - - <p><a href="/news/2017/03/29/table-sql-api-update.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -414,6 +414,16 @@ what’s coming in Flink 1.4.0 as well as a preview of what the Flink community <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page15/index.html b/content/blog/page15/index.html index e5782b18f..8f4fe6378 100644 --- a/content/blog/page15/index.html +++ b/content/blog/page15/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2017/03/29/table-sql-api-update.html">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a></h2> + + <p>29 Mar 2017 by Timo Walther (<a href="https://twitter.com/">@twalthr</a>) + </p> + + <p><p>Broadening the user base and unifying batch & streaming with relational APIs</p></p> + + <p><a href="/news/2017/03/29/table-sql-api-update.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2017/03/23/release-1.1.5.html">Apache Flink 1.1.5 Released</a></h2> @@ -363,20 +376,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2016/05/24/stream-sql.html">Stream Processing for Everyone with SQL and Apache Flink</a></h2> - - <p>24 May 2016 by Fabian Hueske (<a href="https://twitter.com/">@fhueske</a>) - </p> - - <p><p>About six months ago, the Apache Flink community started an effort to add a SQL interface for stream data analysis. SQL is <i>the</i> standard language to access and process data. Everybody who occasionally analyzes data is familiar with SQL. Consequently, a SQL interface for stream data processing will make this technology accessible to a much wider audience. Moreover, SQL support for streaming data will also enable new use cases such as interactive and ad-hoc stream analysi [...] -<p>In this blog post, we report on the current status, architectural design, and future plans of the Apache Flink community to implement support for SQL as a language for analyzing data streams.</p></p> - - <p><a href="/news/2016/05/24/stream-sql.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -409,6 +408,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page16/index.html b/content/blog/page16/index.html index 84d581427..a1bd853b0 100644 --- a/content/blog/page16/index.html +++ b/content/blog/page16/index.html @@ -232,6 +232,20 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2016/05/24/stream-sql.html">Stream Processing for Everyone with SQL and Apache Flink</a></h2> + + <p>24 May 2016 by Fabian Hueske (<a href="https://twitter.com/">@fhueske</a>) + </p> + + <p><p>About six months ago, the Apache Flink community started an effort to add a SQL interface for stream data analysis. SQL is <i>the</i> standard language to access and process data. Everybody who occasionally analyzes data is familiar with SQL. Consequently, a SQL interface for stream data processing will make this technology accessible to a much wider audience. Moreover, SQL support for streaming data will also enable new use cases such as interactive and ad-hoc stream analysi [...] +<p>In this blog post, we report on the current status, architectural design, and future plans of the Apache Flink community to implement support for SQL as a language for analyzing data streams.</p></p> + + <p><a href="/news/2016/05/24/stream-sql.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2016/05/11/release-1.0.3.html">Flink 1.0.3 Released</a></h2> @@ -361,20 +375,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2015/12/04/Introducing-windows.html">Introducing Stream Windows in Apache Flink</a></h2> - - <p>04 Dec 2015 by Fabian Hueske (<a href="https://twitter.com/">@fhueske</a>) - </p> - - <p><p>The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache F [...] -<p>In this blog post, we discuss the concept of windows for stream processing, present Flink's built-in windows, and explain its support for custom windowing semantics.</p></p> - - <p><a href="/news/2015/12/04/Introducing-windows.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -407,6 +407,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page17/index.html b/content/blog/page17/index.html index 7d067495e..08b643a28 100644 --- a/content/blog/page17/index.html +++ b/content/blog/page17/index.html @@ -232,6 +232,20 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2015/12/04/Introducing-windows.html">Introducing Stream Windows in Apache Flink</a></h2> + + <p>04 Dec 2015 by Fabian Hueske (<a href="https://twitter.com/">@fhueske</a>) + </p> + + <p><p>The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache F [...] +<p>In this blog post, we discuss the concept of windows for stream processing, present Flink's built-in windows, and explain its support for custom windowing semantics.</p></p> + + <p><a href="/news/2015/12/04/Introducing-windows.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2015/11/27/release-0.10.1.html">Flink 0.10.1 released</a></h2> @@ -370,26 +384,6 @@ vertex-centric or gather-sum-apply to Flink dataflows.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2015/04/13/release-0.9.0-milestone1.html">Announcing Flink 0.9.0-milestone1 preview release</a></h2> - - <p>13 Apr 2015 - </p> - - <p><p>The Apache Flink community is pleased to announce the availability of -the 0.9.0-milestone-1 release. The release is a preview of the -upcoming 0.9.0 release. It contains many new features which will be -available in the upcoming 0.9 release. Interested users are encouraged -to try it out and give feedback. As the version number indicates, this -release is a preview release that contains known issues.</p> - -</p> - - <p><a href="/news/2015/04/13/release-0.9.0-milestone1.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -422,6 +416,16 @@ release is a preview release that contains known issues.</p> <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page18/index.html b/content/blog/page18/index.html index 1354a5e9e..b2457bdd0 100644 --- a/content/blog/page18/index.html +++ b/content/blog/page18/index.html @@ -232,6 +232,26 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2015/04/13/release-0.9.0-milestone1.html">Announcing Flink 0.9.0-milestone1 preview release</a></h2> + + <p>13 Apr 2015 + </p> + + <p><p>The Apache Flink community is pleased to announce the availability of +the 0.9.0-milestone-1 release. The release is a preview of the +upcoming 0.9.0 release. It contains many new features which will be +available in the upcoming 0.9 release. Interested users are encouraged +to try it out and give feedback. As the version number indicates, this +release is a preview release that contains known issues.</p> + +</p> + + <p><a href="/news/2015/04/13/release-0.9.0-milestone1.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2015/04/07/march-in-flink.html">March 2015 in the Flink community</a></h2> @@ -372,21 +392,6 @@ and offers a new API including definition of flexible windows.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2014/10/03/upcoming_events.html">Upcoming Events</a></h2> - - <p>03 Oct 2014 - </p> - - <p><p>We are happy to announce several upcoming Flink events both in Europe and the US. Starting with a <strong>Flink hackathon in Stockholm</strong> (Oct 8-9) and a talk about Flink at the <strong>Stockholm Hadoop User Group</strong> (Oct 8). This is followed by the very first <strong>Flink Meetup in Berlin</strong> (Oct 15). In the US, there will be two Flink Meetup talks: the first one at the <strong>Pasadena Big Data User Group</strong> (Oct 29) and the second one at <strong>Si [...] - -</p> - - <p><a href="/news/2014/10/03/upcoming_events.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -419,6 +424,16 @@ and offers a new API including definition of flexible windows.</p> <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page19/index.html b/content/blog/page19/index.html index 3663edd5a..423ff87f7 100644 --- a/content/blog/page19/index.html +++ b/content/blog/page19/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2014/10/03/upcoming_events.html">Upcoming Events</a></h2> + + <p>03 Oct 2014 + </p> + + <p><p>We are happy to announce several upcoming Flink events both in Europe and the US. Starting with a <strong>Flink hackathon in Stockholm</strong> (Oct 8-9) and a talk about Flink at the <strong>Stockholm Hadoop User Group</strong> (Oct 8). This is followed by the very first <strong>Flink Meetup in Berlin</strong> (Oct 15). In the US, there will be two Flink Meetup talks: the first one at the <strong>Pasadena Big Data User Group</strong> (Oct 29) and the second one at <strong>Si [...] + +</p> + + <p><a href="/news/2014/10/03/upcoming_events.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2014/09/26/release-0.6.1.html">Apache Flink 0.6.1 available</a></h2> @@ -297,6 +312,16 @@ academic and open source project that Flink originates from.</p> <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page2/index.html b/content/blog/page2/index.html index 9ceecd2aa..91106a53e 100644 --- a/content/blog/page2/index.html +++ b/content/blog/page2/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2022/01/17/release-1.14.3.html">Apache Flink 1.14.3 Release Announcement</a></h2> + + <p>17 Jan 2022 + Thomas Weise (<a href="https://twitter.com/thweise">@thweise</a>) & Martijn Visser (<a href="https://twitter.com/martijnvisser82">@martijnvisser82</a>)</p> + + <p>The Apache Flink community released the second bugfix version of the Apache Flink 1.14 series.</p> + + <p><a href="/news/2022/01/17/release-1.14.3.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2022/01/07/release-ml-2.0.0.html">Apache Flink ML 2.0.0 Release Announcement</a></h2> @@ -353,21 +366,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2021/10/19/release-1.13.3.html">Apache Flink 1.13.3 Released</a></h2> - - <p>19 Oct 2021 - Chesnay Schepler </p> - - <p><p>The Apache Flink community released the third bugfix version of the Apache Flink 1.13 series.</p> - -</p> - - <p><a href="/news/2021/10/19/release-1.13.3.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -400,6 +398,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page3/index.html b/content/blog/page3/index.html index 7350838a5..44844c578 100644 --- a/content/blog/page3/index.html +++ b/content/blog/page3/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2021/10/19/release-1.13.3.html">Apache Flink 1.13.3 Released</a></h2> + + <p>19 Oct 2021 + Chesnay Schepler </p> + + <p><p>The Apache Flink community released the third bugfix version of the Apache Flink 1.13 series.</p> + +</p> + + <p><a href="/news/2021/10/19/release-1.13.3.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2021/09/29/release-1.14.0.html">Apache Flink 1.14.0 Release Announcement</a></h2> @@ -371,21 +386,6 @@ This new release brings various improvements to the StateFun runtime, a leaner w <hr> - <article> - <h2 class="blog-title"><a href="/news/2021/05/28/release-1.13.1.html">Apache Flink 1.13.1 Released</a></h2> - - <p>28 May 2021 - Dawid Wysakowicz (<a href="https://twitter.com/dwysakowicz">@dwysakowicz</a>)</p> - - <p><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.13 series.</p> - -</p> - - <p><a href="/news/2021/05/28/release-1.13.1.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -418,6 +418,16 @@ This new release brings various improvements to the StateFun runtime, a leaner w <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page4/index.html b/content/blog/page4/index.html index 21404487a..79208888c 100644 --- a/content/blog/page4/index.html +++ b/content/blog/page4/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2021/05/28/release-1.13.1.html">Apache Flink 1.13.1 Released</a></h2> + + <p>28 May 2021 + Dawid Wysakowicz (<a href="https://twitter.com/dwysakowicz">@dwysakowicz</a>)</p> + + <p><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.13 series.</p> + +</p> + + <p><a href="/news/2021/05/28/release-1.13.1.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2021/05/21/release-1.12.4.html">Apache Flink 1.12.4 Released</a></h2> @@ -361,21 +376,6 @@ to develop scalable, consistent, and elastic distributed applications.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2021/01/19/release-1.12.1.html">Apache Flink 1.12.1 Released</a></h2> - - <p>19 Jan 2021 - Xintong Song </p> - - <p><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.12 series.</p> - -</p> - - <p><a href="/news/2021/01/19/release-1.12.1.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -408,6 +408,16 @@ to develop scalable, consistent, and elastic distributed applications.</p> <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page5/index.html b/content/blog/page5/index.html index c911cde55..4d50ab2c6 100644 --- a/content/blog/page5/index.html +++ b/content/blog/page5/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2021/01/19/release-1.12.1.html">Apache Flink 1.12.1 Released</a></h2> + + <p>19 Jan 2021 + Xintong Song </p> + + <p><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.12 series.</p> + +</p> + + <p><a href="/news/2021/01/19/release-1.12.1.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/2021/01/18/rocksdb.html">Using RocksDB State Backend in Apache Flink: When and How</a></h2> @@ -355,19 +370,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2020/10/13/stateful-serverless-internals.html">Stateful Functions Internals: Behind the scenes of Stateful Serverless</a></h2> - - <p>13 Oct 2020 - Tzu-Li (Gordon) Tai (<a href="https://twitter.com/tzulitai">@tzulitai</a>)</p> - - <p>This blog post dives deep into the internals of the StateFun runtime, taking a look at how it enables consistent and fault-tolerant stateful serverless applications.</p> - - <p><a href="/news/2020/10/13/stateful-serverless-internals.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -400,6 +402,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page6/index.html b/content/blog/page6/index.html index d1104d89a..ff5ae2ed2 100644 --- a/content/blog/page6/index.html +++ b/content/blog/page6/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2020/10/13/stateful-serverless-internals.html">Stateful Functions Internals: Behind the scenes of Stateful Serverless</a></h2> + + <p>13 Oct 2020 + Tzu-Li (Gordon) Tai (<a href="https://twitter.com/tzulitai">@tzulitai</a>)</p> + + <p>This blog post dives deep into the internals of the StateFun runtime, taking a look at how it enables consistent and fault-tolerant stateful serverless applications.</p> + + <p><a href="/news/2020/10/13/stateful-serverless-internals.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2020/09/28/release-statefun-2.2.0.html">Stateful Functions 2.2.0 Release Announcement</a></h2> @@ -361,19 +374,6 @@ as well as increased observability for operational purposes.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2020/07/30/demo-fraud-detection-3.html">Advanced Flink Application Patterns Vol.3: Custom Window Processing</a></h2> - - <p>30 Jul 2020 - Alexander Fedulov (<a href="https://twitter.com/alex_fedulov">@alex_fedulov</a>)</p> - - <p>In this series of blog posts you will learn about powerful Flink patterns for building streaming applications.</p> - - <p><a href="/news/2020/07/30/demo-fraud-detection-3.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -406,6 +406,16 @@ as well as increased observability for operational purposes.</p> <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page7/index.html b/content/blog/page7/index.html index 82760aef7..96ddd5837 100644 --- a/content/blog/page7/index.html +++ b/content/blog/page7/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2020/07/30/demo-fraud-detection-3.html">Advanced Flink Application Patterns Vol.3: Custom Window Processing</a></h2> + + <p>30 Jul 2020 + Alexander Fedulov (<a href="https://twitter.com/alex_fedulov">@alex_fedulov</a>)</p> + + <p>In this series of blog posts you will learn about powerful Flink patterns for building streaming applications.</p> + + <p><a href="/news/2020/07/30/demo-fraud-detection-3.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></h2> @@ -367,21 +380,6 @@ and provide a tutorial for running Streaming ETL with Flink on Zeppelin.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2020/06/09/release-statefun-2.1.0.html">Stateful Functions 2.1.0 Release Announcement</a></h2> - - <p>09 Jun 2020 - Marta Paes (<a href="https://twitter.com/morsapaes">@morsapaes</a>)</p> - - <p><p>The Apache Flink community is happy to announce the release of Stateful Functions (StateFun) 2.1.0! This release introduces new features around state expiration and performance improvements for co-located deployments, as well as other important changes that improve the stability and testability of the project. As the community around StateFun grows, the release cycle will follow this pattern of smaller and more frequent releases to incorporate user feedback and allow for fast [...] - -</p> - - <p><a href="/news/2020/06/09/release-statefun-2.1.0.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -414,6 +412,16 @@ and provide a tutorial for running Streaming ETL with Flink on Zeppelin.</p> <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page8/index.html b/content/blog/page8/index.html index 98737523d..f40787813 100644 --- a/content/blog/page8/index.html +++ b/content/blog/page8/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2020/06/09/release-statefun-2.1.0.html">Stateful Functions 2.1.0 Release Announcement</a></h2> + + <p>09 Jun 2020 + Marta Paes (<a href="https://twitter.com/morsapaes">@morsapaes</a>)</p> + + <p><p>The Apache Flink community is happy to announce the release of Stateful Functions (StateFun) 2.1.0! This release introduces new features around state expiration and performance improvements for co-located deployments, as well as other important changes that improve the stability and testability of the project. As the community around StateFun grows, the release cycle will follow this pattern of smaller and more frequent releases to incorporate user feedback and allow for fast [...] + +</p> + + <p><a href="/news/2020/06/09/release-statefun-2.1.0.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2020/05/12/release-1.10.1.html">Apache Flink 1.10.1 Released</a></h2> @@ -356,21 +371,6 @@ This release marks a big milestone: Stateful Functions 2.0 is not only an API up <hr> - <article> - <h2 class="blog-title"><a href="/features/2020/03/27/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></h2> - - <p>27 Mar 2020 - Bowen Li (<a href="https://twitter.com/Bowen__Li">@Bowen__Li</a>)</p> - - <p><p>In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse.</p> - -</p> - - <p><a href="/features/2020/03/27/flink-for-data-warehouse.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -403,6 +403,16 @@ This release marks a big milestone: Stateful Functions 2.0 is not only an API up <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/blog/page9/index.html b/content/blog/page9/index.html index 62b2727a4..084d7d5c8 100644 --- a/content/blog/page9/index.html +++ b/content/blog/page9/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/features/2020/03/27/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></h2> + + <p>27 Mar 2020 + Bowen Li (<a href="https://twitter.com/Bowen__Li">@Bowen__Li</a>)</p> + + <p><p>In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse.</p> + +</p> + + <p><a href="/features/2020/03/27/flink-for-data-warehouse.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></h2> @@ -355,19 +370,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2019/12/09/flink-kubernetes-kudo.html">Running Apache Flink on Kubernetes with KUDO</a></h2> - - <p>09 Dec 2019 - Gerred Dillon </p> - - <p>A common use case for Apache Flink is streaming data analytics together with Apache Kafka, which provides a pub/sub model and durability for data streams. In this post, we demonstrate how to orchestrate Flink and Kafka with KUDO.</p> - - <p><a href="/news/2019/12/09/flink-kubernetes-kudo.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -400,6 +402,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></li> + + + + + + + + + <li><a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></li> diff --git a/content/community.html b/content/community.html index 047793833..cc949a878 100644 --- a/content/community.html +++ b/content/community.html @@ -650,6 +650,12 @@ <td class="text-center">Committer</td> <td class="text-center">andra</td> </tr> + <tr> + <td class="text-center"><img src="https://avatars.githubusercontent.com/u/11065508?v=4" class="committer-avatar" /></td> + <td class="text-center">Yuan Mei</td> + <td class="text-center">PMC, Committer</td> + <td class="text-center">yuanmei</td> + </tr> <tr> <td class="text-center"><img src="https://avatars0.githubusercontent.com/u/89049?s=50" class="committer-avatar" /></td> <td class="text-center">Robert Metzger</td> diff --git a/content/index.html b/content/index.html index 34bca2a34..daad84afe 100644 --- a/content/index.html +++ b/content/index.html @@ -397,6 +397,9 @@ <dl> + <dt> <a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></dt> + <dd>An overview of the new AsyncBaseSink and how to use it for building your own concrete sink</dd> + <dt> <a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></dt> <dd>Flink 1.15 introduced a new Runtime Execution Mode named 'thread' mode in PyFlink. This post explains how it works and when to use it.</dd> @@ -416,9 +419,6 @@ exciting changes.</p> <dd><p>The Apache Flink Community is pleased to announce the preview release of the Apache Flink Kubernetes Operator (0.1.0)</p> </dd> - - <dt> <a href="/news/2022/03/11/release-1.14.4.html">Apache Flink 1.14.4 Release Announcement</a></dt> - <dd>The Apache Flink Community is please to announce another bug fix release for Flink 1.14.</dd> </dl> diff --git a/content/zh/community.html b/content/zh/community.html index 9383ba09c..b7f1976e4 100644 --- a/content/zh/community.html +++ b/content/zh/community.html @@ -641,6 +641,12 @@ <td class="text-center">Committer</td> <td class="text-center">andra</td> </tr> + <tr> + <td class="text-center"><img src="https://avatars.githubusercontent.com/u/11065508?v=4" class="committer-avatar" /></td> + <td class="text-center">Yuan Mei</td> + <td class="text-center">PMC, Committer</td> + <td class="text-center">yuanmei</td> + </tr> <tr> <td class="text-center"><img src="https://avatars0.githubusercontent.com/u/89049?s=50" class="committer-avatar" /></td> <td class="text-center">Robert Metzger</td> diff --git a/content/zh/index.html b/content/zh/index.html index a6de526f8..7b3fda615 100644 --- a/content/zh/index.html +++ b/content/zh/index.html @@ -394,6 +394,9 @@ <dl> + <dt> <a href="/2022/05/06/async-sink-base.html">The Generic Asynchronous Base Sink</a></dt> + <dd>An overview of the new AsyncBaseSink and how to use it for building your own concrete sink</dd> + <dt> <a href="/2022/05/06/pyflink-1.15-thread-mode.html">Exploring the thread mode in PyFlink</a></dt> <dd>Flink 1.15 introduced a new Runtime Execution Mode named 'thread' mode in PyFlink. This post explains how it works and when to use it.</dd> @@ -413,9 +416,6 @@ exciting changes.</p> <dd><p>The Apache Flink Community is pleased to announce the preview release of the Apache Flink Kubernetes Operator (0.1.0)</p> </dd> - - <dt> <a href="/news/2022/03/11/release-1.14.4.html">Apache Flink 1.14.4 Release Announcement</a></dt> - <dd>The Apache Flink Community is please to announce another bug fix release for Flink 1.14.</dd> </dl>
