This is an automated email from the ASF dual-hosted git repository. zhuzh pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/flink-web.git
commit 799bbfc171c2aad944f9c19f15081f6c5924b1c9 Author: Zhu Zhu <[email protected]> AuthorDate: Fri Jun 17 14:21:23 2022 +0800 Rebuild website --- content/2022/06/17/adaptive-batch-scheduler.html | 515 +++++++++++++++++++++ content/blog/feed.xml | 338 ++++++++++---- content/blog/index.html | 38 +- content/blog/page10/index.html | 40 +- content/blog/page11/index.html | 40 +- content/blog/page12/index.html | 40 +- content/blog/page13/index.html | 40 +- content/blog/page14/index.html | 38 +- content/blog/page15/index.html | 36 +- content/blog/page16/index.html | 36 +- content/blog/page17/index.html | 42 +- content/blog/page18/index.html | 48 +- content/blog/page19/index.html | 29 ++ content/blog/page2/index.html | 40 +- content/blog/page3/index.html | 41 +- content/blog/page4/index.html | 41 +- content/blog/page5/index.html | 40 +- content/blog/page6/index.html | 38 +- content/blog/page7/index.html | 38 +- content/blog/page8/index.html | 40 +- content/blog/page9/index.html | 40 +- .../1-overall-structure.png | Bin 0 -> 159198 bytes .../2-dynamic-graph.png | Bin 0 -> 237074 bytes .../3-static-graph-subpartition-mapping.png | Bin 0 -> 65781 bytes .../4-dynamic-graph-subpartition-mapping.png | Bin 0 -> 224171 bytes .../5-auto-rebalance.png | Bin 0 -> 97770 bytes .../parallelism-formula.png | Bin 0 -> 70215 bytes .../range-formula.png | Bin 0 -> 50121 bytes content/index.html | 9 +- content/zh/index.html | 9 +- 30 files changed, 1242 insertions(+), 374 deletions(-) diff --git a/content/2022/06/17/adaptive-batch-scheduler.html b/content/2022/06/17/adaptive-batch-scheduler.html new file mode 100644 index 000000000..47dafa354 --- /dev/null +++ b/content/2022/06/17/adaptive-batch-scheduler.html @@ -0,0 +1,515 @@ +<!DOCTYPE html> +<html lang="en"> + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> + <title>Apache Flink: Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</title> + <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"> + <link rel="icon" href="/favicon.ico" type="image/x-icon"> + + <!-- Bootstrap --> + <link rel="stylesheet" href="/css/bootstrap.min.css"> + <link rel="stylesheet" href="/css/flink.css"> + <link rel="stylesheet" href="/css/syntax.css"> + + <!-- Blog RSS feed --> + <link href="/blog/feed.xml" rel="alternate" type="application/rss+xml" title="Apache Flink Blog: RSS feed" /> + + <!-- jQuery (necessary for Bootstrap's JavaScript plugins) --> + <!-- We need to load Jquery in the header for custom google analytics event tracking--> + <script src="/js/jquery.min.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!-- WARNING: Respond.js doesn't work if you view the page via file:// --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + <!-- Matomo --> + <script> + var _paq = window._paq = window._paq || []; + /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ + /* We explicitly disable cookie tracking to avoid privacy issues */ + _paq.push(['disableCookies']); + /* Measure a visit to flink.apache.org and nightlies.apache.org/flink as the same visit */ + _paq.push(["setDomains", ["*.flink.apache.org","*.nightlies.apache.org/flink"]]); + _paq.push(['trackPageView']); + _paq.push(['enableLinkTracking']); + (function() { + var u="//matomo.privacy.apache.org/"; + _paq.push(['setTrackerUrl', u+'matomo.php']); + _paq.push(['setSiteId', '1']); + var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; + g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); + })(); + </script> + <!-- End Matomo Code --> + </head> + <body> + + + <!-- Main content. --> + <div class="container"> + <div class="row"> + + + <div id="sidebar" class="col-sm-3"> + + +<!-- Top navbar. --> + <nav class="navbar navbar-default"> + <!-- The logo. --> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <div class="navbar-logo"> + <a href="/"> + <img alt="Apache Flink" src="/img/flink-header-logo.svg" width="147px" height="73px"> + </a> + </div> + </div><!-- /.navbar-header --> + + <!-- The navigation links. --> + <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1"> + <ul class="nav navbar-nav navbar-main"> + + <!-- First menu section explains visitors what Flink is --> + + <!-- What is Stream Processing? --> + <!-- + <li><a href="/streamprocessing1.html">What is Stream Processing?</a></li> + --> + + <!-- What is Flink? --> + <li><a href="/flink-architecture.html">What is Apache Flink?</a></li> + + + + <!-- Stateful Functions? --> + + <li><a href="https://nightlies.apache.org/flink/flink-statefun-docs-stable/">What is Stateful Functions?</a></li> + + <!-- Flink ML? --> + + <li><a href="https://nightlies.apache.org/flink/flink-ml-docs-stable/">What is Flink ML?</a></li> + + <!-- Use cases --> + <li><a href="/usecases.html">Use Cases</a></li> + + <!-- Powered by --> + <li><a href="/poweredby.html">Powered By</a></li> + + + + <!-- Second menu section aims to support Flink users --> + + <!-- Downloads --> + <li><a href="/downloads.html">Downloads</a></li> + + <!-- Getting Started --> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Getting Started<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="https://nightlies.apache.org/flink/flink-docs-release-1.15//docs/try-flink/local_installation/" target="_blank">With Flink <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/getting-started/project-setup.html" target="_blank">With Flink Stateful Functions <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-ml-docs-release-2.0/try-flink-ml/quick-start.html" target="_blank">With Flink ML <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.0/try-flink-kubernetes-operator/quick-start.html" target="_blank">With Flink Kubernetes Operator <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-table-store-docs-release-0.1/try-table-store/quick-start.html" target="_blank">With Flink Table Store <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="/training.html">Training Course</a></li> + </ul> + </li> + + <!-- Documentation --> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Documentation<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="https://nightlies.apache.org/flink/flink-docs-release-1.15" target="_blank">Flink 1.15 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-docs-master" target="_blank">Flink Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2" target="_blank">Flink Stateful Functions 3.2 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-statefun-docs-master" target="_blank">Flink Stateful Functions Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-ml-docs-release-2.0" target="_blank">Flink ML 2.0 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-ml-docs-master" target="_blank">Flink ML Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.0" target="_blank">Flink Kubernetes Operator 1.0 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main" target="_blank">Flink Kubernetes Operator Main (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-table-store-docs-release-0.1" target="_blank">Flink Table Store 0.1 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://nightlies.apache.org/flink/flink-table-store-docs-master" target="_blank">Flink Table Store Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + </ul> + </li> + + <!-- getting help --> + <li><a href="/gettinghelp.html">Getting Help</a></li> + + <!-- Blog --> + <li><a href="/blog/"><b>Flink Blog</b></a></li> + + + <!-- Flink-packages --> + <li> + <a href="https://flink-packages.org" target="_blank">flink-packages.org <small><span class="glyphicon glyphicon-new-window"></span></small></a> + </li> + + + <!-- Third menu section aim to support community and contributors --> + + <!-- Community --> + <li><a href="/community.html">Community & Project Info</a></li> + + <!-- Roadmap --> + <li><a href="/roadmap.html">Roadmap</a></li> + + <!-- Contribute --> + <li><a href="/contributing/how-to-contribute.html">How to Contribute</a></li> + + + <!-- GitHub --> + <li> + <a href="https://github.com/apache/flink" target="_blank">Flink on GitHub <small><span class="glyphicon glyphicon-new-window"></span></small></a> + </li> + + + + <!-- Language Switcher --> + <li> + + + <a href="/zh/2022/06/17/adaptive-batch-scheduler.html">中文版</a> + + + </li> + + </ul> + + <style> + .smalllinks:link { + display: inline-block !important; background: none; padding-top: 0px; padding-bottom: 0px; padding-right: 0px; min-width: 75px; + } + </style> + + <ul class="nav navbar-nav navbar-bottom"> + <hr /> + + <!-- Twitter --> + <li><a href="https://twitter.com/apacheflink" target="_blank">@ApacheFlink <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + + <!-- Visualizer --> + <li class=" hidden-md hidden-sm"><a href="/visualizer/" target="_blank">Plan Visualizer <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + + <li > + <a href="/security.html">Flink Security</a> + </li> + + <hr /> + + <li><a href="https://apache.org" target="_blank">Apache Software Foundation <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + + <li> + + <a class="smalllinks" href="https://www.apache.org/licenses/" target="_blank">License</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + + <a class="smalllinks" href="https://www.apache.org/security/" target="_blank">Security</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + + <a class="smalllinks" href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Donate</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + + <a class="smalllinks" href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + </li> + + </ul> + </div><!-- /.navbar-collapse --> + </nav> + + </div> + <div class="col-sm-9"> + <div class="row-fluid"> + <div class="col-sm-12"> + <div class="row"> + <h1>Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</h1> + <p><i></i></p> + + <article> + <p>17 Jun 2022 Lijie Wang & Zhu Zhu </p> + +<div class="page-toc"> +<ul id="markdown-toc"> + <li><a href="#introduction" id="markdown-toc-introduction">Introduction</a></li> + <li><a href="#get-started" id="markdown-toc-get-started">Get Started</a> <ul> + <li><a href="#configure-to-use-adaptive-batch-scheduler" id="markdown-toc-configure-to-use-adaptive-batch-scheduler">Configure to use adaptive batch scheduler</a></li> + <li><a href="#set-the-parallelism-of-operators-to--1" id="markdown-toc-set-the-parallelism-of-operators-to--1">Set the parallelism of operators to -1</a></li> + </ul> + </li> + <li><a href="#implementation-details" id="markdown-toc-implementation-details">Implementation Details</a> <ul> + <li><a href="#collect-sizes-of-consumed-datasets" id="markdown-toc-collect-sizes-of-consumed-datasets">Collect sizes of consumed datasets</a></li> + <li><a href="#decide-proper-parallelisms-of-job-vertices" id="markdown-toc-decide-proper-parallelisms-of-job-vertices">Decide proper parallelisms of job vertices</a> <ul> + <li><a href="#limit-the-maximum-ratio-of-broadcast-bytes" id="markdown-toc-limit-the-maximum-ratio-of-broadcast-bytes">Limit the maximum ratio of broadcast bytes</a></li> + <li><a href="#normalize-the-parallelism-to-the-closest-power-of-2" id="markdown-toc-normalize-the-parallelism-to-the-closest-power-of-2">Normalize the parallelism to the closest power of 2</a></li> + </ul> + </li> + <li><a href="#build-up-execution-graph-dynamically" id="markdown-toc-build-up-execution-graph-dynamically">Build up execution graph dynamically</a> <ul> + <li><a href="#create-execution-vertices-and-execution-edges-lazily" id="markdown-toc-create-execution-vertices-and-execution-edges-lazily">Create execution vertices and execution edges lazily</a></li> + <li><a href="#flexible-subpartition-mapping" id="markdown-toc-flexible-subpartition-mapping">Flexible subpartition mapping</a></li> + </ul> + </li> + <li><a href="#update-and-schedule-the-dynamic-execution-graph" id="markdown-toc-update-and-schedule-the-dynamic-execution-graph">Update and schedule the dynamic execution graph</a></li> + </ul> + </li> + <li><a href="#future-improvement" id="markdown-toc-future-improvement">Future improvement</a> <ul> + <li><a href="#auto-rebalancing-of-workloads" id="markdown-toc-auto-rebalancing-of-workloads">Auto-rebalancing of workloads</a></li> + </ul> + </li> +</ul> + +</div> + +<h1 id="introduction">Introduction</h1> + +<p>Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling.</p> + +<p>To decide a proper parallelism, one needs to know how much data each operator needs to process. However, It can be hard to predict data volume to be processed by a job because it can be different everyday. And it can be harder or even impossible (due to complex operators or UDFs) to predict data volume to be processed by each operator.</p> + +<p>To solve this problem, we introduced the adaptive batch scheduler in Flink 1.15. The adaptive batch scheduler can automatically decide parallelism of an operator according to the size of its consumed datasets. Here are the benefits the adaptive batch scheduler can bring:</p> + +<ol> + <li>Batch job users can be relieved from parallelism tuning.</li> + <li>Parallelism tuning is fine grained considering different operators. This is particularly beneficial for SQL jobs which can only be set with a global parallelism previously.</li> + <li>Parallelism tuning can better fit consumed datasets which have a varying volume size every day.</li> +</ol> + +<h1 id="get-started">Get Started</h1> + +<p>To automatically decide parallelism of operators, you need to:</p> + +<ol> + <li>Configure to use adaptive batch scheduler.</li> + <li>Set the parallelism of operators to -1.</li> +</ol> + +<h2 id="configure-to-use-adaptive-batch-scheduler">Configure to use adaptive batch scheduler</h2> + +<p>To use adaptive batch scheduler, you need to set configurations as below:</p> + +<ul> + <li>Set <code>jobmanager.scheduler: AdaptiveBatch</code>.</li> + <li>Leave the <a href="https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#execution-batch-shuffle-mode">execution.batch-shuffle-mode</a> unset or explicitly set it to <code>ALL-EXCHANGES-BLOCKING</code> (default value). Currently, the adaptive batch scheduler only supports batch jobs whose shuffle mode is <code>ALL-EXCHANGES-BLOCKING</code>.</li> +</ul> + +<p>In addition, there are several related configuration options to control the upper bounds and lower bounds of tuned parallelisms, to specify expected data volume to process by each operator, and to specify the default parallelism of sources. More details can be found in the <a href="https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/elastic_scaling/#configure-to-use-adaptive-batch-scheduler">feature documentation page</a>.</p> + +<h2 id="set-the-parallelism-of-operators-to--1">Set the parallelism of operators to -1</h2> + +<p>The adaptive batch scheduler only automatically decides parallelism of operators whose parallelism is not set (which means the parallelism is -1). To leave parallelism unset, you should configure as follows:</p> + +<ul> + <li>Set <code>parallelism.default: -1</code> for all jobs.</li> + <li>Set <code>table.exec.resource.default-parallelism: -1</code> for SQL jobs.</li> + <li>Don’t call <code>setParallelism()</code> for operators in DataStream/DataSet jobs.</li> + <li>Don’t call <code>setParallelism()</code> on <code>StreamExecutionEnvironment/ExecutionEnvironment</code> in DataStream/DataSet jobs.</li> +</ul> + +<h1 id="implementation-details">Implementation Details</h1> + +<p>In this section, we will elaborate the details of the implementation. Before that, we need to briefly introduce some concepts involved:</p> + +<ul> + <li><a href="https://github.com/apache/flink/blob/release-1.15/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/JobVertex.java">JobVertex</a> and <a href="https://github.com/apache/flink/blob/release-1.15/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/JobGraph.java">JobGraph</a>: A job vertex is an operator chain formed by chaining several operators together for better performance. The job graph is a data flow consisting of job vertices.</li> + <li><a href="https://github.com/apache/flink/blob/release-1.15/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java">ExecutionVertex</a> and <a href="https://github.com/apache/flink/blob/release-1.15/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java">ExecutionGraph</a>: An execution vertex represents a parallel subtask of a job vertex, which will eventually be instantiated as a physical task. For example, a job v [...] +</ul> + +<p>More details about the above concepts can be found in the <a href="https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/internals/job_scheduling/#jobmanager-data-structures">Flink documentation</a>. Note that the adaptive batch scheduler decides the parallelism of operators by deciding the parallelism of the job vertices which consist of these operators. To automatically decide parallelism of job vertices, we introduced the following changes:</p> + +<ol> + <li>Enabled the scheduler to collect sizes of finished datasets.</li> + <li>Introduced a new component <a href="https://github.com/apache/flink/blob/release-1.15/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptivebatch/VertexParallelismDecider.java">VertexParallelismDecider</a> to compute proper parallelisms of job vertices according to the sizes of their consumed results.</li> + <li>Enabled to dynamically build up execution graph to allow the parallelisms of job vertices to be decided lazily. The execution graph starts with an empty execution topology and then gradually attaches the vertices during job execution.</li> + <li>Introduced the adaptive batch scheduler to update and schedule the dynamic execution graph.</li> +</ol> + +<p>The details will be introduced in the following sections.</p> + +<center> +<br /> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/1-overall-structure.png" width="60%" /> +<br /> +Fig. 1 - The overall structure of automatically deciding parallelism +</center> + +<p><br /></p> + +<h2 id="collect-sizes-of-consumed-datasets">Collect sizes of consumed datasets</h2> + +<p>The adaptive batch scheduler decides the parallelism of vertices by the size of input results, so the scheduler needs to know the sizes of result partitions produced by tasks. We introduced a numBytesProduced counter to record the size of each produced result partition, the accumulated result of the counter will be sent to the scheduler when tasks finish.</p> + +<h2 id="decide-proper-parallelisms-of-job-vertices">Decide proper parallelisms of job vertices</h2> + +<p>We introduced a new component <a href="https://github.com/apache/flink/blob/release-1.15/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptivebatch/VertexParallelismDecider.java">VertexParallelismDecider</a> to compute proper parallelisms of job vertices according to the sizes of their consumed results. The computation algorithm is as follows:</p> + +<p>Suppose</p> + +<ul> + <li><strong><em>V</em></strong> is the bytes of data the user expects to be processed by each task.</li> + <li><strong><em>totalBytes<sub>non-broadcast</sub></em></strong> is the sum of the non-broadcast result sizes consumed by this job vertex.</li> + <li><strong><em>totalBytes<sub>broadcast</sub></em></strong> is the sum of the broadcast result sizes consumed by this job vertex.</li> + <li><strong><em>maxBroadcastRatio</em></strong> is the maximum ratio of broadcast bytes that affects the parallelism calculation.</li> + <li><strong><em>normalize(</em></strong>x<strong><em>)</em></strong> is a function that round <strong><em>x</em></strong> to the closest power of 2.</li> +</ul> + +<p>then the parallelism of this job vertex <strong><em>P</em></strong> will be:</p> +<center> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/parallelism-formula.png" width="60%" /> +</center> + +<p>Note that we introduced two special treatment in the above formula :</p> + +<ul> + <li><a href="#limit-the-maximum-ratio-of-broadcast-bytes">Limit the maximum ratio of broadcast bytes</a></li> + <li><a href="#normalize-the-parallelism-to-the-closest-power-of-2">Normalize the parallelism to the closest power of 2</a></li> +</ul> + +<p>However, the above formula cannot be used to decide the parallelism of the source vertices, because the source vertices have no input. To solve it, we introduced the configuration option <code>jobmanager.adaptive-batch-scheduler.default-source-parallelism</code> to allow users to manually configure the parallelism of source vertices. Note that not all data sources need this option, because some data sources can automatically infer parallelism (For example, HiveTableSource, see <a href [...] + +<h3 id="limit-the-maximum-ratio-of-broadcast-bytes">Limit the maximum ratio of broadcast bytes</h3> +<p>As you can see, we limit the maximum ratio of broadcast bytes that affects the parallelism calculation to <strong><em>maxBroadcastRatio</em></strong>. That is, the non-broadcast bytes processed by each task is at least <strong><em>(1-maxBroadcastRatio) * V</em></strong>. If not so,when the total broadcast bytes is close to <strong><em>V</em></strong>, even if the total non-broadcast bytes is very small, it may cause a large parallelism, which is unnecessary and may lead to resource wa [...] + +<p>Generally, the broadcast dataset is usually relatively small against the other co-processed datasets, so we set the maximum ratio to 0.5 by default. The value is hard coded in the first version, and we may make it configurable later.</p> + +<h3 id="normalize-the-parallelism-to-the-closest-power-of-2">Normalize the parallelism to the closest power of 2</h3> +<p>The normalize is to avoid introducing data skew. To better understand this section, we suggest you read the <a href="#flexible-subpartition-mapping">Flexible subpartition mapping</a> section first.</p> + +<p>Taking Fig. 4 (b) as example, A1/A2 produces 4 subpartitions, and the decided parallelism of B is 3. In this case, B1 will consume 1 subpartition, B2 will consume 1 subpartition, and B3 will consume 2 subpartitions. We assume that subpartitions have the same amount of data, which means B3 will consume twice the data of other tasks, data skew is introduced due to the subpartition mapping.</p> + +<p>To solve this problem, we need to make the subpartitions evenly consumed by downstream tasks, which means the number of subpartitions should be a multiple of the number of downstream tasks. For simplicity, we require the user-specified max parallelism to be 2<sup>N</sup>, and then adjust the calculated parallelism to a closest 2<sup>M</sup> (M <= N), so that we can guarantee that subpartitions will be evenly consumed by downstream tasks.</p> + +<p>Note that this is a temporary solution, the ultimate solution would be the <a href="#auto-rebalancing-of-workloads">Auto-rebalancing of workloads</a>, which may come soon.</p> + +<h2 id="build-up-execution-graph-dynamically">Build up execution graph dynamically</h2> +<p>Before the adaptive batch scheduler was introduced to Flink, the execution graph was fully built in a static way before starting scheduling. To allow parallelisms of job vertices to be decided lazily, the execution graph must be able to be built up dynamically.</p> + +<h3 id="create-execution-vertices-and-execution-edges-lazily">Create execution vertices and execution edges lazily</h3> +<p>A dynamic execution graph means that a Flink job starts with an empty execution topology, and then gradually attaches vertices during job execution, as shown in Fig. 2.</p> + +<p>The execution topology consists of execution vertices and execution edges. The execution vertices will be created and attached to the execution topology only when:</p> + +<ul> + <li>The parallelism of the corresponding job vertex is decided.</li> + <li>All upstream execution vertices are already attached.</li> +</ul> + +<p>The parallelism of the job vertex needs to be decided first so that Flink knows how many execution vertices should be created. Upstream execution vertices need to be attached first so that Flink can connect the newly created execution vertices to the upstream vertices with execution edges.</p> + +<center> +<br /> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/2-dynamic-graph.png" width="90%" /> +<br /> +Fig. 2 - Build up execution graph dynamically +</center> + +<p><br /></p> + +<h3 id="flexible-subpartition-mapping">Flexible subpartition mapping</h3> +<p>Before the adaptive batch scheduler was introduced to Flink, when deploying a task, Flink needed to know the parallelism of its consumer job vertex. This is because consumer vertex parallelism is used to decide the number of subpartitions produced by each upstream task. The reason behind that is, for one result partition, different subpartitions serve different consumer execution vertices. More specifically, one consumer execution vertex only consumes data from subpartition with the s [...] + +<p>Taking Fig. 3 as example, parallelism of the consumer B is 2, so the result partition produced by A1/A2 should contain 2 subpartitions, the subpartition with index 0 serves B1, and the subpartition with index 1 serves B2.</p> + +<center> +<br /> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/3-static-graph-subpartition-mapping.png" width="30%" /> +<br /> +Fig. 3 - How subpartitions serve consumer execution vertices with static execution graph +</center> + +<p><br /></p> + +<p>But obviously, this doesn’t work for dynamic graphs, because when a job vertex is deployed, the parallelism of its consumer job vertices may not have been decided yet. To enable Flink to work in this case, we need a way to allow a job vertex to run without knowing the parallelism of its consumer job vertices.</p> + +<p>To achieve this goal, we can set the number of subpartitions to be the max parallelism of the consumer job vertex. Then when the consumer execution vertices are deployed, they should be assigned with a subpartition range to consume. Suppose N is the number of consumer execution vertices and P is the number of subpartitions. For the kth consumer execution vertex, the consumed subpartition range should be:</p> + +<center> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/range-formula.png" width="55%" /> +</center> + +<p>Taking Fig. 4 as example, the max parallelism of B is 4, so A1/A2 have 4 subpartitions. And then if the decided parallelism of B is 2, then the subpartitions mapping will be Fig. 4 (a), if the decided parallelism of B is 3, then the subpartitions mapping will be Fig. 4 (b).</p> + +<center> +<br /> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/4-dynamic-graph-subpartition-mapping.png" width="75%" /> +<br /> +Fig. 4 - How subpartitions serve consumer execution vertices with dynamic graph +</center> + +<p><br /></p> + +<h2 id="update-and-schedule-the-dynamic-execution-graph">Update and schedule the dynamic execution graph</h2> +<p>The adaptive batch scheduler scheduling is similar to the default scheduler, the only difference is that an empty dynamic execution graph will be generated initially and vertices will be attached later. Before handling any scheduling event, the scheduler will try deciding the parallelisms of job vertices, and then initialize them to generate execution vertices, connecting execution edges, and update the execution graph.</p> + +<p>The scheduler will try to decide the parallelism of all job vertices before handling each scheduling event, and the parallelism decision will be made for each job vertex in topological order:</p> + +<ul> + <li>For source vertices, the parallelism should have been decided before starting scheduling.</li> + <li>For non-source vertices, the parallelism can be decided only when all its consumed results are fully produced.</li> +</ul> + +<p>After deciding the parallelism, the scheduler will try to initialize the job vertices in topological order. A job vertex that can be initialized should meet the following conditions:</p> + +<ul> + <li>The parallelism of the job vertex has been decided and the job vertex has not been initialized yet.</li> + <li>All upstream job vertices have been initialized.</li> +</ul> + +<h1 id="future-improvement">Future improvement</h1> + +<h2 id="auto-rebalancing-of-workloads">Auto-rebalancing of workloads</h2> + +<p>When running batch jobs, data skew may occur (a task needs to process much larger data than other tasks), which leads to long-tail tasks and further slows down the finish of jobs. Users usually hope that the system can automatically solve this problem. +One typical data skew case is that some subpartitions have a significantly larger amount of data than others. This case can be solved by finer grained subpartitions and auto-rebalancing of workload. The work of the adaptive batch scheduler can be considered as the first step towards it, because the requirements of auto-rebalancing are similar to adaptive batch scheduler, they both need the support of dynamic graphs and the collection of result partitions size. +Based on the implementation of adaptive batch scheduler, we can solve the above problem by increasing max parallelism (for finer grained subpartitions) and simply changing the subpartition range division algorithm (for auto-rebalancing). In the current design, the subpartition range is divided according to the number of subpartitions, we can change it to divide according to the amount of data in subpartitions, so that the amount of data within each subpartition range can be approximately [...] + +<center> +<br /> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/5-auto-rebalance.png" width="75%" /> +<br /> +Fig. 5 - Auto-rebalance with finer grained subpartitions +</center> + + </article> + </div> + + <div class="row"> + <div id="disqus_thread"></div> + <script type="text/javascript"> + /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ + var disqus_shortname = 'stratosphere-eu'; // required: replace example with your forum shortname + + /* * * DON'T EDIT BELOW THIS LINE * * */ + (function() { + var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; + dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; + (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); + })(); + </script> + </div> + </div> +</div> + </div> + </div> + + <hr /> + + <div class="row"> + <div class="footer text-center col-sm-12"> + <p>Copyright © 2014-2022 <a href="http://apache.org">The Apache Software Foundation</a>. All Rights Reserved.</p> + <p>Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.</p> + <p><a href="https://privacy.apache.org/policies/privacy-policy-public.html">Privacy Policy</a> · <a href="/blog/feed.xml">RSS feed</a></p> + </div> + </div> + </div><!-- /.container --> + + <!-- Include all compiled plugins (below), or include individual files as needed --> + <script src="/js/jquery.matchHeight-min.js"></script> + <script src="/js/bootstrap.min.js"></script> + <script src="/js/codetabs.js"></script> + <script src="/js/stickysidebar.js"></script> + </body> +</html> diff --git a/content/blog/feed.xml b/content/blog/feed.xml index 9c70dc08b..510447f05 100644 --- a/content/blog/feed.xml +++ b/content/blog/feed.xml @@ -6,6 +6,253 @@ <link>https://flink.apache.org/blog</link> <atom:link href="https://flink.apache.org/blog/feed.xml" rel="self" type="application/rss+xml" /> +<item> +<title>Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</title> +<description><div class="page-toc"> +<ul id="markdown-toc"> + <li><a href="#introduction" id="markdown-toc-introduction">Introduction</a></li> + <li><a href="#get-started" id="markdown-toc-get-started">Get Started</a> <ul> + <li><a href="#configure-to-use-adaptive-batch-scheduler" id="markdown-toc-configure-to-use-adaptive-batch-scheduler">Configure to use adaptive batch scheduler</a></li> + <li><a href="#set-the-parallelism-of-operators-to--1" id="markdown-toc-set-the-parallelism-of-operators-to--1">Set the parallelism of operators to -1</a></li> + </ul> + </li> + <li><a href="#implementation-details" id="markdown-toc-implementation-details">Implementation Details</a> <ul> + <li><a href="#collect-sizes-of-consumed-datasets" id="markdown-toc-collect-sizes-of-consumed-datasets">Collect sizes of consumed datasets</a></li> + <li><a href="#decide-proper-parallelisms-of-job-vertices" id="markdown-toc-decide-proper-parallelisms-of-job-vertices">Decide proper parallelisms of job vertices</a> <ul> + <li><a href="#limit-the-maximum-ratio-of-broadcast-bytes" id="markdown-toc-limit-the-maximum-ratio-of-broadcast-bytes">Limit the maximum ratio of broadcast bytes</a></li> + <li><a href="#normalize-the-parallelism-to-the-closest-power-of-2" id="markdown-toc-normalize-the-parallelism-to-the-closest-power-of-2">Normalize the parallelism to the closest power of 2</a></li> + </ul> + </li> + <li><a href="#build-up-execution-graph-dynamically" id="markdown-toc-build-up-execution-graph-dynamically">Build up execution graph dynamically</a> <ul> + <li><a href="#create-execution-vertices-and-execution-edges-lazily" id="markdown-toc-create-execution-vertices-and-execution-edges-lazily">Create execution vertices and execution edges lazily</a></li> + <li><a href="#flexible-subpartition-mapping" id="markdown-toc-flexible-subpartition-mapping">Flexible subpartition mapping</a></li> + </ul> + </li> + <li><a href="#update-and-schedule-the-dynamic-execution-graph" id="markdown-toc-update-and-schedule-the-dynamic-execution-graph">Update and schedule the dynamic execution graph</a></li> + </ul> + </li> + <li><a href="#future-improvement" id="markdown-toc-future-improvement">Future improvement</a> <ul> + <li><a href="#auto-rebalancing-of-workloads" id="markdown-toc-auto-rebalancing-of-workloads">Auto-rebalancing of workloads</a></li> + </ul> + </li> +</ul> + +</div> + +<h1 id="introduction">Introduction</h1> + +<p>Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling.</p> + +<p>To decide a proper parallelism, one needs to know how much data each operator needs to process. However, It can be hard to predict data volume to be processed by a job because it can be different everyday. And it can be harder or even impossible (due to complex operators or UDFs) to predict data volume to be processed by each operator.</p> + +<p>To solve this problem, we introduced the adaptive batch scheduler in Flink 1.15. The adaptive batch scheduler can automatically decide parallelism of an operator according to the size of its consumed datasets. Here are the benefits the adaptive batch scheduler can bring:</p> + +<ol> + <li>Batch job users can be relieved from parallelism tuning.</li> + <li>Parallelism tuning is fine grained considering different operators. This is particularly beneficial for SQL jobs which can only be set with a global parallelism previously.</li> + <li>Parallelism tuning can better fit consumed datasets which have a varying volume size every day.</li> +</ol> + +<h1 id="get-started">Get Started</h1> + +<p>To automatically decide parallelism of operators, you need to:</p> + +<ol> + <li>Configure to use adaptive batch scheduler.</li> + <li>Set the parallelism of operators to -1.</li> +</ol> + +<h2 id="configure-to-use-adaptive-batch-scheduler">Configure to use adaptive batch scheduler</h2> + +<p>To use adaptive batch scheduler, you need to set configurations as below:</p> + +<ul> + <li>Set <code>jobmanager.scheduler: AdaptiveBatch</code>.</li> + <li>Leave the <a href="https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#execution-batch-shuffle-mode">execution.batch-shuffle-mode</a> unset or explicitly set it to <code>ALL-EXCHANGES-BLOCKING</code> (default value). Currently, the adaptive batch scheduler only supports batch jobs whose shuffle mode is <code>ALL-EXCHANGES-BLOCKING</code>.</li> +</ul> + +<p>In addition, there are several related configuration options to control the upper bounds and lower bounds of tuned parallelisms, to specify expected data volume to process by each operator, and to specify the default parallelism of sources. More details can be found in the <a href="https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/elastic_scaling/#configure-to-use-adaptive-batch-scheduler">feature documentation page</a>.</p> + +<h2 id="set-the-parallelism-of-operators-to--1">Set the parallelism of operators to -1</h2> + +<p>The adaptive batch scheduler only automatically decides parallelism of operators whose parallelism is not set (which means the parallelism is -1). To leave parallelism unset, you should configure as follows:</p> + +<ul> + <li>Set <code>parallelism.default: -1</code> for all jobs.</li> + <li>Set <code>table.exec.resource.default-parallelism: -1</code> for SQL jobs.</li> + <li>Don’t call <code>setParallelism()</code> for operators in DataStream/DataSet jobs.</li> + <li>Don’t call <code>setParallelism()</code> on <code>StreamExecutionEnvironment/ExecutionEnvironment</code> in DataStream/DataSet jobs.</li> +</ul> + +<h1 id="implementation-details">Implementation Details</h1> + +<p>In this section, we will elaborate the details of the implementation. Before that, we need to briefly introduce some concepts involved:</p> + +<ul> + <li><a href="https://github.com/apache/flink/blob/release-1.15/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/JobVertex.java">JobVertex</a> and <a href="https://github.com/apache/flink/blob/release-1.15/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/JobGraph.java">JobGraph</a>: A job vertex is an operator chain formed by chaining several operators together for better performance. The job graph is a data flo [...] + <li><a href="https://github.com/apache/flink/blob/release-1.15/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java">ExecutionVertex</a> and <a href="https://github.com/apache/flink/blob/release-1.15/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java">ExecutionGraph</a>: An execution vertex represents a parallel subtask of a job vertex, which will eventually be ins [...] +</ul> + +<p>More details about the above concepts can be found in the <a href="https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/internals/job_scheduling/#jobmanager-data-structures">Flink documentation</a>. Note that the adaptive batch scheduler decides the parallelism of operators by deciding the parallelism of the job vertices which consist of these operators. To automatically decide parallelism of job vertices, we introduced the following changes:</p> + +<ol> + <li>Enabled the scheduler to collect sizes of finished datasets.</li> + <li>Introduced a new component <a href="https://github.com/apache/flink/blob/release-1.15/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptivebatch/VertexParallelismDecider.java">VertexParallelismDecider</a> to compute proper parallelisms of job vertices according to the sizes of their consumed results.</li> + <li>Enabled to dynamically build up execution graph to allow the parallelisms of job vertices to be decided lazily. The execution graph starts with an empty execution topology and then gradually attaches the vertices during job execution.</li> + <li>Introduced the adaptive batch scheduler to update and schedule the dynamic execution graph.</li> +</ol> + +<p>The details will be introduced in the following sections.</p> + +<center> +<br /> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/1-overall-structure.png" width="60%" /> +<br /> +Fig. 1 - The overall structure of automatically deciding parallelism +</center> + +<p><br /></p> + +<h2 id="collect-sizes-of-consumed-datasets">Collect sizes of consumed datasets</h2> + +<p>The adaptive batch scheduler decides the parallelism of vertices by the size of input results, so the scheduler needs to know the sizes of result partitions produced by tasks. We introduced a numBytesProduced counter to record the size of each produced result partition, the accumulated result of the counter will be sent to the scheduler when tasks finish.</p> + +<h2 id="decide-proper-parallelisms-of-job-vertices">Decide proper parallelisms of job vertices</h2> + +<p>We introduced a new component <a href="https://github.com/apache/flink/blob/release-1.15/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptivebatch/VertexParallelismDecider.java">VertexParallelismDecider</a> to compute proper parallelisms of job vertices according to the sizes of their consumed results. The computation algorithm is as follows:</p> + +<p>Suppose</p> + +<ul> + <li><strong><em>V</em></strong> is the bytes of data the user expects to be processed by each task.</li> + <li><strong><em>totalBytes<sub>non-broadcast</sub></em></strong> is the sum of the non-broadcast result sizes consumed by this job vertex.</li> + <li><strong><em>totalBytes<sub>broadcast</sub></em></strong> is the sum of the broadcast result sizes consumed by this job vertex.</li> + <li><strong><em>maxBroadcastRatio</em></strong> is the maximum ratio of broadcast bytes that affects the parallelism calculation.</li> + <li><strong><em>normalize(</em></strong>x<strong><em>)</em></strong> is a function that round <strong><em>x</em></strong> to the closest power of 2.</li> +</ul> + +<p>then the parallelism of this job vertex <strong><em>P</em></strong> will be:</p> +<center> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/parallelism-formula.png" width="60%" /> +</center> + +<p>Note that we introduced two special treatment in the above formula :</p> + +<ul> + <li><a href="#limit-the-maximum-ratio-of-broadcast-bytes">Limit the maximum ratio of broadcast bytes</a></li> + <li><a href="#normalize-the-parallelism-to-the-closest-power-of-2">Normalize the parallelism to the closest power of 2</a></li> +</ul> + +<p>However, the above formula cannot be used to decide the parallelism of the source vertices, because the source vertices have no input. To solve it, we introduced the configuration option <code>jobmanager.adaptive-batch-scheduler.default-source-parallelism</code> to allow users to manually configure the parallelism of source vertices. Note that not all data sources need this option, because some data sources can automatically infer parallelism (For example, HiveTableS [...] + +<h3 id="limit-the-maximum-ratio-of-broadcast-bytes">Limit the maximum ratio of broadcast bytes</h3> +<p>As you can see, we limit the maximum ratio of broadcast bytes that affects the parallelism calculation to <strong><em>maxBroadcastRatio</em></strong>. That is, the non-broadcast bytes processed by each task is at least <strong><em>(1-maxBroadcastRatio) * V</em></strong>. If not so,when the total broadcast bytes is close to <strong><em>V</em></strong>, even if the total non-broadcast bytes is very small, it m [...] + +<p>Generally, the broadcast dataset is usually relatively small against the other co-processed datasets, so we set the maximum ratio to 0.5 by default. The value is hard coded in the first version, and we may make it configurable later.</p> + +<h3 id="normalize-the-parallelism-to-the-closest-power-of-2">Normalize the parallelism to the closest power of 2</h3> +<p>The normalize is to avoid introducing data skew. To better understand this section, we suggest you read the <a href="#flexible-subpartition-mapping">Flexible subpartition mapping</a> section first.</p> + +<p>Taking Fig. 4 (b) as example, A1/A2 produces 4 subpartitions, and the decided parallelism of B is 3. In this case, B1 will consume 1 subpartition, B2 will consume 1 subpartition, and B3 will consume 2 subpartitions. We assume that subpartitions have the same amount of data, which means B3 will consume twice the data of other tasks, data skew is introduced due to the subpartition mapping.</p> + +<p>To solve this problem, we need to make the subpartitions evenly consumed by downstream tasks, which means the number of subpartitions should be a multiple of the number of downstream tasks. For simplicity, we require the user-specified max parallelism to be 2<sup>N</sup>, and then adjust the calculated parallelism to a closest 2<sup>M</sup> (M &lt;= N), so that we can guarantee that subpartitions will be evenly consumed by downstream tasks.</p> + +<p>Note that this is a temporary solution, the ultimate solution would be the <a href="#auto-rebalancing-of-workloads">Auto-rebalancing of workloads</a>, which may come soon.</p> + +<h2 id="build-up-execution-graph-dynamically">Build up execution graph dynamically</h2> +<p>Before the adaptive batch scheduler was introduced to Flink, the execution graph was fully built in a static way before starting scheduling. To allow parallelisms of job vertices to be decided lazily, the execution graph must be able to be built up dynamically.</p> + +<h3 id="create-execution-vertices-and-execution-edges-lazily">Create execution vertices and execution edges lazily</h3> +<p>A dynamic execution graph means that a Flink job starts with an empty execution topology, and then gradually attaches vertices during job execution, as shown in Fig. 2.</p> + +<p>The execution topology consists of execution vertices and execution edges. The execution vertices will be created and attached to the execution topology only when:</p> + +<ul> + <li>The parallelism of the corresponding job vertex is decided.</li> + <li>All upstream execution vertices are already attached.</li> +</ul> + +<p>The parallelism of the job vertex needs to be decided first so that Flink knows how many execution vertices should be created. Upstream execution vertices need to be attached first so that Flink can connect the newly created execution vertices to the upstream vertices with execution edges.</p> + +<center> +<br /> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/2-dynamic-graph.png" width="90%" /> +<br /> +Fig. 2 - Build up execution graph dynamically +</center> + +<p><br /></p> + +<h3 id="flexible-subpartition-mapping">Flexible subpartition mapping</h3> +<p>Before the adaptive batch scheduler was introduced to Flink, when deploying a task, Flink needed to know the parallelism of its consumer job vertex. This is because consumer vertex parallelism is used to decide the number of subpartitions produced by each upstream task. The reason behind that is, for one result partition, different subpartitions serve different consumer execution vertices. More specifically, one consumer execution vertex only consumes data from subpartition with [...] + +<p>Taking Fig. 3 as example, parallelism of the consumer B is 2, so the result partition produced by A1/A2 should contain 2 subpartitions, the subpartition with index 0 serves B1, and the subpartition with index 1 serves B2.</p> + +<center> +<br /> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/3-static-graph-subpartition-mapping.png" width="30%" /> +<br /> +Fig. 3 - How subpartitions serve consumer execution vertices with static execution graph +</center> + +<p><br /></p> + +<p>But obviously, this doesn’t work for dynamic graphs, because when a job vertex is deployed, the parallelism of its consumer job vertices may not have been decided yet. To enable Flink to work in this case, we need a way to allow a job vertex to run without knowing the parallelism of its consumer job vertices.</p> + +<p>To achieve this goal, we can set the number of subpartitions to be the max parallelism of the consumer job vertex. Then when the consumer execution vertices are deployed, they should be assigned with a subpartition range to consume. Suppose N is the number of consumer execution vertices and P is the number of subpartitions. For the kth consumer execution vertex, the consumed subpartition range should be:</p> + +<center> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/range-formula.png" width="55%" /> +</center> + +<p>Taking Fig. 4 as example, the max parallelism of B is 4, so A1/A2 have 4 subpartitions. And then if the decided parallelism of B is 2, then the subpartitions mapping will be Fig. 4 (a), if the decided parallelism of B is 3, then the subpartitions mapping will be Fig. 4 (b).</p> + +<center> +<br /> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/4-dynamic-graph-subpartition-mapping.png" width="75%" /> +<br /> +Fig. 4 - How subpartitions serve consumer execution vertices with dynamic graph +</center> + +<p><br /></p> + +<h2 id="update-and-schedule-the-dynamic-execution-graph">Update and schedule the dynamic execution graph</h2> +<p>The adaptive batch scheduler scheduling is similar to the default scheduler, the only difference is that an empty dynamic execution graph will be generated initially and vertices will be attached later. Before handling any scheduling event, the scheduler will try deciding the parallelisms of job vertices, and then initialize them to generate execution vertices, connecting execution edges, and update the execution graph.</p> + +<p>The scheduler will try to decide the parallelism of all job vertices before handling each scheduling event, and the parallelism decision will be made for each job vertex in topological order:</p> + +<ul> + <li>For source vertices, the parallelism should have been decided before starting scheduling.</li> + <li>For non-source vertices, the parallelism can be decided only when all its consumed results are fully produced.</li> +</ul> + +<p>After deciding the parallelism, the scheduler will try to initialize the job vertices in topological order. A job vertex that can be initialized should meet the following conditions:</p> + +<ul> + <li>The parallelism of the job vertex has been decided and the job vertex has not been initialized yet.</li> + <li>All upstream job vertices have been initialized.</li> +</ul> + +<h1 id="future-improvement">Future improvement</h1> + +<h2 id="auto-rebalancing-of-workloads">Auto-rebalancing of workloads</h2> + +<p>When running batch jobs, data skew may occur (a task needs to process much larger data than other tasks), which leads to long-tail tasks and further slows down the finish of jobs. Users usually hope that the system can automatically solve this problem. +One typical data skew case is that some subpartitions have a significantly larger amount of data than others. This case can be solved by finer grained subpartitions and auto-rebalancing of workload. The work of the adaptive batch scheduler can be considered as the first step towards it, because the requirements of auto-rebalancing are similar to adaptive batch scheduler, they both need the support of dynamic graphs and the collection of result partitions size. +Based on the implementation of adaptive batch scheduler, we can solve the above problem by increasing max parallelism (for finer grained subpartitions) and simply changing the subpartition range division algorithm (for auto-rebalancing). In the current design, the subpartition range is divided according to the number of subpartitions, we can change it to divide according to the amount of data in subpartitions, so that the amount of data within each subpartition range can be approximately [...] + +<center> +<br /> +<img src="/img/blog/2022-06-17-adaptive-batch-scheduler/5-auto-rebalance.png" width="75%" /> +<br /> +Fig. 5 - Auto-rebalance with finer grained subpartitions +</center> +</description> +<pubDate>Fri, 17 Jun 2022 02:00:00 +0200</pubDate> +<link>https://flink.apache.org/2022/06/17/adaptive-batch-scheduler.html</link> +<guid isPermaLink="true">/2022/06/17/adaptive-batch-scheduler.html</guid> +</item> + <item> <title>Apache Flink Kubernetes Operator 1.0.0 Release Announcement</title> <description><p>In the last two months since our <a href="https://flink.apache.org/news/2022/04/03/release-kubernetes-operator-0.1.0.html">initial preview release</a> the community has been hard at work to stabilize and improve the core Flink Kubernetes Operator logic. @@ -19915,96 +20162,5 @@ Transaction{timestamp=1563395831000, origin=1, target='3', amount=160}} <guid isPermaLink="true">/feature/2019/09/13/state-processor-api.html</guid> </item> -<item> -<title>Apache Flink 1.8.2 Released</title> -<description><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.8 series.</p> - -<p>This release includes 23 fixes and minor improvements for Flink 1.8.1. The list below includes a detailed list of all fixes and improvements.</p> - -<p>We highly recommend all users to upgrade to Flink 1.8.2.</p> - -<p>Updated Maven dependencies:</p> - -<div class="highlight"><pre><code class="language-xml"><span class="nt">&lt;dependency&gt;</span> - <span class="nt">&lt;groupId&gt;</span>org.apache.flink<span class="nt">&lt;/groupId&gt;</span> - <span class="nt">&lt;artifactId&gt;</span>flink-java<span class="nt">&lt;/artifactId&gt;</span> - <span class="nt">&lt;version&gt;</span>1.8.2<span class="nt">&lt;/version&gt;</span> -<span class="nt">&lt;/dependency&gt;</span> -<span class="nt">&lt;dependency&gt;</span> - <span class="nt">&lt;groupId&gt;</span>org.apache.flink<span class="nt">&lt;/groupId&gt;</span> - <span class="nt">&lt;artifactId&gt;</span>flink-streaming-java_2.11<span class="nt">&lt;/artifactId&gt;</span> - <span class="nt">&lt;version&gt;</span>1.8.2<span class="nt">&lt;/version&gt;</span> -<span class="nt">&lt;/dependency&gt;</span> -<span class="nt">&lt;dependency&gt;</span> - <span class="nt">&lt;groupId&gt;</span>org.apache.flink<span class="nt">&lt;/groupId&gt;</span> - <span class="nt">&lt;artifactId&gt;</span>flink-clients_2.11<span class="nt">&lt;/artifactId&gt;</span> - <span class="nt">&lt;version&gt;</span>1.8.2<span class="nt">&lt;/version&gt;</span> -<span class="nt">&lt;/dependency&gt;</span></code></pre></div> - -<p>You can find the binaries on the updated <a href="/downloads.html">Downloads page</a>.</p> - -<p>List of resolved issues:</p> - -<h2> Bug -</h2> -<ul> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13941">FLINK-13941</a>] - Prevent data-loss by not cleaning up small part files from S3. -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-9526">FLINK-9526</a>] - BucketingSink end-to-end test failed on Travis -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-10368">FLINK-10368</a>] - &#39;Kerberized YARN on Docker test&#39; unstable -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-12319">FLINK-12319</a>] - StackOverFlowError in cep.nfa.sharedbuffer.SharedBuffer -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-12736">FLINK-12736</a>] - ResourceManager may release TM with allocated slots -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-12889">FLINK-12889</a>] - Job keeps in FAILING state -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13059">FLINK-13059</a>] - Cassandra Connector leaks Semaphore on Exception; hangs on close -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13159">FLINK-13159</a>] - java.lang.ClassNotFoundException when restore job -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13367">FLINK-13367</a>] - Make ClosureCleaner detect writeReplace serialization override -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13369">FLINK-13369</a>] - Recursive closure cleaner ends up with stackOverflow in case of circular dependency -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13394">FLINK-13394</a>] - Use fallback unsafe secure MapR in nightly.sh -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13484">FLINK-13484</a>] - ConnectedComponents end-to-end test instable with NoResourceAvailableException -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13499">FLINK-13499</a>] - Remove dependency on MapR artifact repository -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13508">FLINK-13508</a>] - CommonTestUtils#waitUntilCondition() may attempt to sleep with negative time -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13586">FLINK-13586</a>] - Method ClosureCleaner.clean broke backward compatibility between 1.8.0 and 1.8.1 -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13761">FLINK-13761</a>] - `SplitStream` should be deprecated because `SplitJavaStream` is deprecated -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13789">FLINK-13789</a>] - Transactional Id Generation fails due to user code impacting formatting string -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13806">FLINK-13806</a>] - Metric Fetcher floods the JM log with errors when TM is lost -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13807">FLINK-13807</a>] - Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8 -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13897">FLINK-13897</a>] - OSS FS NOTICE file is placed in wrong directory -</li> -</ul> - -<h2> Improvement -</h2> -<ul> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-12578">FLINK-12578</a>] - Use secure URLs for Maven repositories -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-12741">FLINK-12741</a>] - Update docs about Kafka producer fault tolerance guarantees -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-12749">FLINK-12749</a>] - Add Flink Operations Playground documentation -</li> -</ul> -</description> -<pubDate>Wed, 11 Sep 2019 14:00:00 +0200</pubDate> -<link>https://flink.apache.org/news/2019/09/11/release-1.8.2.html</link> -<guid isPermaLink="true">/news/2019/09/11/release-1.8.2.html</guid> -</item> - </channel> </rss> diff --git a/content/blog/index.html b/content/blog/index.html index 0bd6f4e0f..6adbe661a 100644 --- a/content/blog/index.html +++ b/content/blog/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></h2> + + <p>17 Jun 2022 + Lijie Wang & Zhu Zhu </p> + + <p>To automatically decide parallelism of Flink batch jobs, we introduced adaptive batch scheduler in Flink 1.15. In this post, we'll take a close look at the design & implementation details.</p> + + <p><a href="/2022/06/17/adaptive-batch-scheduler.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></h2> @@ -361,21 +374,6 @@ exciting changes.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2022/04/03/release-kubernetes-operator-0.1.0.html">Apache Flink Kubernetes Operator 0.1.0 Release Announcement</a></h2> - - <p>03 Apr 2022 - Gyula Fora (<a href="https://twitter.com/GyulaFora">@GyulaFora</a>)</p> - - <p><p>The Apache Flink Community is pleased to announce the preview release of the Apache Flink Kubernetes Operator (0.1.0)</p> - -</p> - - <p><a href="/news/2022/04/03/release-kubernetes-operator-0.1.0.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -408,6 +406,16 @@ exciting changes.</p> <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page10/index.html b/content/blog/page10/index.html index cb22271da..d2d9c5033 100644 --- a/content/blog/page10/index.html +++ b/content/blog/page10/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2020/02/11/release-1.10.0.html">Apache Flink 1.10.0 Release Announcement</a></h2> + + <p>11 Feb 2020 + Marta Paes (<a href="https://twitter.com/morsapaes">@morsapaes</a>)</p> + + <p><p>The Apache Flink community is excited to hit the double digits and announce the release of Flink 1.10.0! As a result of the biggest community effort to date, with over 1.2k issues implemented and more than 200 contributors, this release introduces significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and great advances in Python support (PyFlink).</p> + +</p> + + <p><a href="/news/2020/02/11/release-1.10.0.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html">A Guide for Unit Testing in Apache Flink</a></h2> @@ -355,21 +370,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2019/09/11/release-1.8.2.html">Apache Flink 1.8.2 Released</a></h2> - - <p>11 Sep 2019 - Jark Wu (<a href="https://twitter.com/JarkWu">@JarkWu</a>)</p> - - <p><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.8 series.</p> - -</p> - - <p><a href="/news/2019/09/11/release-1.8.2.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -402,6 +402,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page11/index.html b/content/blog/page11/index.html index fc94089b3..5fb45039c 100644 --- a/content/blog/page11/index.html +++ b/content/blog/page11/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2019/09/11/release-1.8.2.html">Apache Flink 1.8.2 Released</a></h2> + + <p>11 Sep 2019 + Jark Wu (<a href="https://twitter.com/JarkWu">@JarkWu</a>)</p> + + <p><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.8 series.</p> + +</p> + + <p><a href="/news/2019/09/11/release-1.8.2.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2019/09/10/community-update.html">Flink Community Update - September'19</a></h2> @@ -354,21 +369,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2019/04/17/sod.html">Apache Flink's Application to Season of Docs</a></h2> - - <p>17 Apr 2019 - Konstantin Knauf (<a href="https://twitter.com/snntrable">@snntrable</a>)</p> - - <p><p>The Apache Flink community is happy to announce its application to the first edition of <a href="https://developers.google.com/season-of-docs/">Season of Docs</a> by Google. The program is bringing together Open Source projects and technical writers to raise awareness for and improve documentation of Open Source projects. While the community is continuously looking for new contributors to collaborate on our documentation, we would like to take this chance to work with one or [...] - -</p> - - <p><a href="/news/2019/04/17/sod.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -401,6 +401,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page12/index.html b/content/blog/page12/index.html index bb557c1bb..1e9680e68 100644 --- a/content/blog/page12/index.html +++ b/content/blog/page12/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2019/04/17/sod.html">Apache Flink's Application to Season of Docs</a></h2> + + <p>17 Apr 2019 + Konstantin Knauf (<a href="https://twitter.com/snntrable">@snntrable</a>)</p> + + <p><p>The Apache Flink community is happy to announce its application to the first edition of <a href="https://developers.google.com/season-of-docs/">Season of Docs</a> by Google. The program is bringing together Open Source projects and technical writers to raise awareness for and improve documentation of Open Source projects. While the community is continuously looking for new contributors to collaborate on our documentation, we would like to take this chance to work with one or [...] + +</p> + + <p><a href="/news/2019/04/17/sod.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2019/04/09/release-1.8.0.html">Apache Flink 1.8.0 Release Announcement</a></h2> @@ -363,21 +378,6 @@ for more details.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2018/12/21/release-1.7.1.html">Apache Flink 1.7.1 Released</a></h2> - - <p>21 Dec 2018 - </p> - - <p><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.7 series.</p> - -</p> - - <p><a href="/news/2018/12/21/release-1.7.1.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -410,6 +410,16 @@ for more details.</p> <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page13/index.html b/content/blog/page13/index.html index 2e7148acd..c31edef1f 100644 --- a/content/blog/page13/index.html +++ b/content/blog/page13/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2018/12/21/release-1.7.1.html">Apache Flink 1.7.1 Released</a></h2> + + <p>21 Dec 2018 + </p> + + <p><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.7 series.</p> + +</p> + + <p><a href="/news/2018/12/21/release-1.7.1.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2018/11/30/release-1.7.0.html">Apache Flink 1.7.0 Release Announcement</a></h2> @@ -369,21 +384,6 @@ Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa <hr> - <article> - <h2 class="blog-title"><a href="/news/2018/05/25/release-1.5.0.html">Apache Flink 1.5.0 Release Announcement</a></h2> - - <p>25 May 2018 - Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>)</p> - - <p><p>The Apache Flink community is thrilled to announce the 1.5.0 release. Over the past 5 months, the Flink community has been working hard to resolve more than 780 issues. Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12341764&projectId=12315522">complete changelog</a> for more detail.</p> - -</p> - - <p><a href="/news/2018/05/25/release-1.5.0.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -416,6 +416,16 @@ Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page14/index.html b/content/blog/page14/index.html index 557e6f05a..6763bff1a 100644 --- a/content/blog/page14/index.html +++ b/content/blog/page14/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2018/05/25/release-1.5.0.html">Apache Flink 1.5.0 Release Announcement</a></h2> + + <p>25 May 2018 + Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>)</p> + + <p><p>The Apache Flink community is thrilled to announce the 1.5.0 release. Over the past 5 months, the Flink community has been working hard to resolve more than 780 issues. Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12341764&projectId=12315522">complete changelog</a> for more detail.</p> + +</p> + + <p><a href="/news/2018/05/25/release-1.5.0.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2018/03/15/release-1.3.3.html">Apache Flink 1.3.3 Released</a></h2> @@ -366,19 +381,6 @@ what’s coming in Flink 1.4.0 as well as a preview of what the Flink community <hr> - <article> - <h2 class="blog-title"><a href="/features/2017/07/04/flink-rescalable-state.html">A Deep Dive into Rescalable State in Apache Flink</a></h2> - - <p>04 Jul 2017 by Stefan Richter (<a href="https://twitter.com/">@StefanRRichter</a>) - </p> - - <p><p>A primer on stateful stream processing and an in-depth walkthrough of rescalable state in Apache Flink.</p></p> - - <p><a href="/features/2017/07/04/flink-rescalable-state.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -411,6 +413,16 @@ what’s coming in Flink 1.4.0 as well as a preview of what the Flink community <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page15/index.html b/content/blog/page15/index.html index 75cf7b3d8..d7daf35fe 100644 --- a/content/blog/page15/index.html +++ b/content/blog/page15/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/features/2017/07/04/flink-rescalable-state.html">A Deep Dive into Rescalable State in Apache Flink</a></h2> + + <p>04 Jul 2017 by Stefan Richter (<a href="https://twitter.com/">@StefanRRichter</a>) + </p> + + <p><p>A primer on stateful stream processing and an in-depth walkthrough of rescalable state in Apache Flink.</p></p> + + <p><a href="/features/2017/07/04/flink-rescalable-state.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2017/06/23/release-1.3.1.html">Apache Flink 1.3.1 Released</a></h2> @@ -362,19 +375,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2016/12/19/2016-year-in-review.html">Apache Flink in 2016: Year in Review</a></h2> - - <p>19 Dec 2016 by Mike Winters - </p> - - <p><p>As 2016 comes to a close, let's take a moment to look back on the Flink community's great work during the past year.</p></p> - - <p><a href="/news/2016/12/19/2016-year-in-review.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -407,6 +407,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page16/index.html b/content/blog/page16/index.html index d53a344f4..e08685424 100644 --- a/content/blog/page16/index.html +++ b/content/blog/page16/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2016/12/19/2016-year-in-review.html">Apache Flink in 2016: Year in Review</a></h2> + + <p>19 Dec 2016 by Mike Winters + </p> + + <p><p>As 2016 comes to a close, let's take a moment to look back on the Flink community's great work during the past year.</p></p> + + <p><a href="/news/2016/12/19/2016-year-in-review.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2016/10/12/release-1.1.3.html">Apache Flink 1.1.3 Released</a></h2> @@ -366,19 +379,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2016/04/06/cep-monitoring.html">Introducing Complex Event Processing (CEP) with Apache Flink</a></h2> - - <p>06 Apr 2016 by Till Rohrmann (<a href="https://twitter.com/">@stsffap</a>) - </p> - - <p>In this blog post, we introduce Flink's new <a href="{{site.DOCS_BASE_URL}}flink-docs-master/apis/streaming/libs/cep.html">CEP library</a> that allows you to do pattern matching on event streams. Through the example of monitoring a data center and generating alerts, we showcase the library's ease of use and its intuitive Pattern API.</p> - - <p><a href="/news/2016/04/06/cep-monitoring.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -411,6 +411,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page17/index.html b/content/blog/page17/index.html index 35cdf74b5..20aa3ef2a 100644 --- a/content/blog/page17/index.html +++ b/content/blog/page17/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2016/04/06/cep-monitoring.html">Introducing Complex Event Processing (CEP) with Apache Flink</a></h2> + + <p>06 Apr 2016 by Till Rohrmann (<a href="https://twitter.com/">@stsffap</a>) + </p> + + <p>In this blog post, we introduce Flink's new <a href="{{site.DOCS_BASE_URL}}flink-docs-master/apis/streaming/libs/cep.html">CEP library</a> that allows you to do pattern matching on event streams. Through the example of monitoring a data center and generating alerts, we showcase the library's ease of use and its intuitive Pattern API.</p> + + <p><a href="/news/2016/04/06/cep-monitoring.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2016/04/06/release-1.0.1.html">Flink 1.0.1 Released</a></h2> @@ -361,25 +374,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2015/09/03/flink-forward.html">Announcing Flink Forward 2015</a></h2> - - <p>03 Sep 2015 - </p> - - <p><p><a href="http://2015.flink-forward.org/">Flink Forward 2015</a> is the first -conference with Flink at its center that aims to bring together the -Apache Flink community in a single place. The organizers are starting -this conference in October 12 and 13 from Berlin, the place where -Apache Flink started.</p> - -</p> - - <p><a href="/news/2015/09/03/flink-forward.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -412,6 +406,16 @@ Apache Flink started.</p> <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page18/index.html b/content/blog/page18/index.html index dbfd3898c..ad3c9aeca 100644 --- a/content/blog/page18/index.html +++ b/content/blog/page18/index.html @@ -232,6 +232,25 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2015/09/03/flink-forward.html">Announcing Flink Forward 2015</a></h2> + + <p>03 Sep 2015 + </p> + + <p><p><a href="http://2015.flink-forward.org/">Flink Forward 2015</a> is the first +conference with Flink at its center that aims to bring together the +Apache Flink community in a single place. The organizers are starting +this conference in October 12 and 13 from Berlin, the place where +Apache Flink started.</p> + +</p> + + <p><a href="/news/2015/09/03/flink-forward.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2015/09/01/release-0.9.1.html">Apache Flink 0.9.1 available</a></h2> @@ -373,25 +392,6 @@ community last month.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2015/02/09/streaming-example.html">Introducing Flink Streaming</a></h2> - - <p>09 Feb 2015 - </p> - - <p><p>This post is the first of a series of blog posts on Flink Streaming, -the recent addition to Apache Flink that makes it possible to analyze -continuous data sources in addition to static files. Flink Streaming -uses the pipelined Flink engine to process data streams in real time -and offers a new API including definition of flexible windows.</p> - -</p> - - <p><a href="/news/2015/02/09/streaming-example.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -424,6 +424,16 @@ and offers a new API including definition of flexible windows.</p> <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page19/index.html b/content/blog/page19/index.html index d6bd6c362..cb8bfb208 100644 --- a/content/blog/page19/index.html +++ b/content/blog/page19/index.html @@ -232,6 +232,25 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2015/02/09/streaming-example.html">Introducing Flink Streaming</a></h2> + + <p>09 Feb 2015 + </p> + + <p><p>This post is the first of a series of blog posts on Flink Streaming, +the recent addition to Apache Flink that makes it possible to analyze +continuous data sources in addition to static files. Flink Streaming +uses the pipelined Flink engine to process data streams in real time +and offers a new API including definition of flexible windows.</p> + +</p> + + <p><a href="/news/2015/02/09/streaming-example.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2015/02/04/january-in-flink.html">January 2015 in the Flink community</a></h2> @@ -387,6 +406,16 @@ academic and open source project that Flink originates from.</p> <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page2/index.html b/content/blog/page2/index.html index 62438c462..ff04b43f6 100644 --- a/content/blog/page2/index.html +++ b/content/blog/page2/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2022/04/03/release-kubernetes-operator-0.1.0.html">Apache Flink Kubernetes Operator 0.1.0 Release Announcement</a></h2> + + <p>03 Apr 2022 + Gyula Fora (<a href="https://twitter.com/GyulaFora">@GyulaFora</a>)</p> + + <p><p>The Apache Flink Community is pleased to announce the preview release of the Apache Flink Kubernetes Operator (0.1.0)</p> + +</p> + + <p><a href="/news/2022/04/03/release-kubernetes-operator-0.1.0.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2022/03/11/release-1.14.4.html">Apache Flink 1.14.4 Release Announcement</a></h2> @@ -352,21 +367,6 @@ This new release brings various improvements to the StateFun runtime, a leaner w <hr> - <article> - <h2 class="blog-title"><a href="/news/2021/12/22/log4j-statefun-release.html">Apache Flink StateFun Log4j emergency release</a></h2> - - <p>22 Dec 2021 - Igal Shilman & Seth Wiesman </p> - - <p><p>The Apache Flink community has released an emergency bugfix version of Apache Flink Stateful Function 3.1.1.</p> - -</p> - - <p><a href="/news/2021/12/22/log4j-statefun-release.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -399,6 +399,16 @@ This new release brings various improvements to the StateFun runtime, a leaner w <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page3/index.html b/content/blog/page3/index.html index 8bd055ff8..300101ed3 100644 --- a/content/blog/page3/index.html +++ b/content/blog/page3/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2021/12/22/log4j-statefun-release.html">Apache Flink StateFun Log4j emergency release</a></h2> + + <p>22 Dec 2021 + Igal Shilman & Seth Wiesman </p> + + <p><p>The Apache Flink community has released an emergency bugfix version of Apache Flink Stateful Function 3.1.1.</p> + +</p> + + <p><a href="/news/2021/12/22/log4j-statefun-release.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2021/12/16/log4j-patch-releases.html">Apache Flink Log4j emergency releases</a></h2> @@ -357,22 +372,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2021/08/31/release-statefun-3.1.0.html">Stateful Functions 3.1.0 Release Announcement</a></h2> - - <p>31 Aug 2021 - Seth Wiesman (<a href="https://twitter.com/sjwiesman">@sjwiesman</a>), Igal Shilman (<a href="https://twitter.com/IgalShilman">@IgalShilman</a>), & Tzu-Li (Gordon) Tai (<a href="https://twitter.com/tzulitai">@tzulitai</a>)</p> - - <p><p>Stateful Functions is a cross-platform stack for building Stateful Serverless applications, making it radically simpler to develop scalable, consistent, and elastic distributed applications. -This new release brings various improvements to the StateFun runtime, a leaner way to specify StateFun module components, and a brand new GoLang SDK!</p> - -</p> - - <p><a href="/news/2021/08/31/release-statefun-3.1.0.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -405,6 +404,16 @@ This new release brings various improvements to the StateFun runtime, a leaner w <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page4/index.html b/content/blog/page4/index.html index 057baf3e9..d64a1a594 100644 --- a/content/blog/page4/index.html +++ b/content/blog/page4/index.html @@ -232,6 +232,22 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2021/08/31/release-statefun-3.1.0.html">Stateful Functions 3.1.0 Release Announcement</a></h2> + + <p>31 Aug 2021 + Seth Wiesman (<a href="https://twitter.com/sjwiesman">@sjwiesman</a>), Igal Shilman (<a href="https://twitter.com/IgalShilman">@IgalShilman</a>), & Tzu-Li (Gordon) Tai (<a href="https://twitter.com/tzulitai">@tzulitai</a>)</p> + + <p><p>Stateful Functions is a cross-platform stack for building Stateful Serverless applications, making it radically simpler to develop scalable, consistent, and elastic distributed applications. +This new release brings various improvements to the StateFun runtime, a leaner way to specify StateFun module components, and a brand new GoLang SDK!</p> + +</p> + + <p><a href="/news/2021/08/31/release-statefun-3.1.0.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2021/08/31/release-1.14.0-rc0.html">Help us stabilize Apache Flink 1.14.0 RC0</a></h2> @@ -368,21 +384,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2021/04/29/release-1.12.3.html">Apache Flink 1.12.3 Released</a></h2> - - <p>29 Apr 2021 - Arvid Heise </p> - - <p><p>The Apache Flink community released the next bugfix version of the Apache Flink 1.12 series.</p> - -</p> - - <p><a href="/news/2021/04/29/release-1.12.3.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -415,6 +416,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page5/index.html b/content/blog/page5/index.html index b14e7a191..9ec706eb2 100644 --- a/content/blog/page5/index.html +++ b/content/blog/page5/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2021/04/29/release-1.12.3.html">Apache Flink 1.12.3 Released</a></h2> + + <p>29 Apr 2021 + Arvid Heise </p> + + <p><p>The Apache Flink community released the next bugfix version of the Apache Flink 1.12 series.</p> + +</p> + + <p><a href="/news/2021/04/29/release-1.12.3.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2021/04/15/release-statefun-3.0.0.html">Stateful Functions 3.0.0: Remote Functions Front and Center</a></h2> @@ -359,21 +374,6 @@ to develop scalable, consistent, and elastic distributed applications.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2021/01/02/release-statefun-2.2.2.html">Stateful Functions 2.2.2 Release Announcement</a></h2> - - <p>02 Jan 2021 - Tzu-Li (Gordon) Tai (<a href="https://twitter.com/tzulitai">@tzulitai</a>)</p> - - <p><p>The Apache Flink community released the second bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.2.</p> - -</p> - - <p><a href="/news/2021/01/02/release-statefun-2.2.2.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -406,6 +406,16 @@ to develop scalable, consistent, and elastic distributed applications.</p> <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page6/index.html b/content/blog/page6/index.html index af128d7d6..688d49714 100644 --- a/content/blog/page6/index.html +++ b/content/blog/page6/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2021/01/02/release-statefun-2.2.2.html">Stateful Functions 2.2.2 Release Announcement</a></h2> + + <p>02 Jan 2021 + Tzu-Li (Gordon) Tai (<a href="https://twitter.com/tzulitai">@tzulitai</a>)</p> + + <p><p>The Apache Flink community released the second bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.2.</p> + +</p> + + <p><a href="/news/2021/01/02/release-statefun-2.2.2.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2020/12/18/release-1.11.3.html">Apache Flink 1.11.3 Released</a></h2> @@ -361,19 +376,6 @@ as well as increased observability for operational purposes.</p> <hr> - <article> - <h2 class="blog-title"><a href="/2020/09/01/flink-1.11-memory-management-improvements.html">Memory Management improvements for Flink’s JobManager in Apache Flink 1.11</a></h2> - - <p>01 Sep 2020 - Andrey Zagrebin </p> - - <p>In a previous blog post, we focused on the memory model of the TaskManagers and its improvements with the Apache Flink 1.10 release. This blog post addresses the same topic but for Flink's JobManager instead.</p> - - <p><a href="/2020/09/01/flink-1.11-memory-management-improvements.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -406,6 +408,16 @@ as well as increased observability for operational purposes.</p> <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page7/index.html b/content/blog/page7/index.html index 68ae00088..bb6d927f0 100644 --- a/content/blog/page7/index.html +++ b/content/blog/page7/index.html @@ -232,6 +232,19 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/2020/09/01/flink-1.11-memory-management-improvements.html">Memory Management improvements for Flink’s JobManager in Apache Flink 1.11</a></h2> + + <p>01 Sep 2020 + Andrey Zagrebin </p> + + <p>In a previous blog post, we focused on the memory model of the TaskManagers and its improvements with the Apache Flink 1.10 release. This blog post addresses the same topic but for Flink's JobManager instead.</p> + + <p><a href="/2020/09/01/flink-1.11-memory-management-improvements.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2020/08/25/release-1.10.2.html">Apache Flink 1.10.2 Released</a></h2> @@ -355,21 +368,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2020/07/21/release-1.11.1.html">Apache Flink 1.11.1 Released</a></h2> - - <p>21 Jul 2020 - Dian Fu (<a href="https://twitter.com/DianFu11">@DianFu11</a>)</p> - - <p><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.11 series.</p> - -</p> - - <p><a href="/news/2020/07/21/release-1.11.1.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -402,6 +400,16 @@ <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page8/index.html b/content/blog/page8/index.html index f21c38e23..2d96737aa 100644 --- a/content/blog/page8/index.html +++ b/content/blog/page8/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2020/07/21/release-1.11.1.html">Apache Flink 1.11.1 Released</a></h2> + + <p>21 Jul 2020 + Dian Fu (<a href="https://twitter.com/DianFu11">@DianFu11</a>)</p> + + <p><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.11 series.</p> + +</p> + + <p><a href="/news/2020/07/21/release-1.11.1.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2020/07/14/application-mode.html">Application Deployment in Flink: Current State and the new Application Mode</a></h2> @@ -367,21 +382,6 @@ and provide a tutorial for running Streaming ETL with Flink on Zeppelin.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2020/04/24/release-1.9.3.html">Apache Flink 1.9.3 Released</a></h2> - - <p>24 Apr 2020 - Dian Fu (<a href="https://twitter.com/DianFu11">@DianFu11</a>)</p> - - <p><p>The Apache Flink community released the third bugfix version of the Apache Flink 1.9 series.</p> - -</p> - - <p><a href="/news/2020/04/24/release-1.9.3.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -414,6 +414,16 @@ and provide a tutorial for running Streaming ETL with Flink on Zeppelin.</p> <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/blog/page9/index.html b/content/blog/page9/index.html index 14acc0d22..ecbbe2e3e 100644 --- a/content/blog/page9/index.html +++ b/content/blog/page9/index.html @@ -232,6 +232,21 @@ <div class="col-sm-8"> <!-- Blog posts --> + <article> + <h2 class="blog-title"><a href="/news/2020/04/24/release-1.9.3.html">Apache Flink 1.9.3 Released</a></h2> + + <p>24 Apr 2020 + Dian Fu (<a href="https://twitter.com/DianFu11">@DianFu11</a>)</p> + + <p><p>The Apache Flink community released the third bugfix version of the Apache Flink 1.9 series.</p> + +</p> + + <p><a href="/news/2020/04/24/release-1.9.3.html">Continue reading »</a></p> + </article> + + <hr> + <article> <h2 class="blog-title"><a href="/news/2020/04/21/memory-management-improvements-flink-1.10.html">Memory Management Improvements with Apache Flink 1.10</a></h2> @@ -354,21 +369,6 @@ This release marks a big milestone: Stateful Functions 2.0 is not only an API up <hr> - <article> - <h2 class="blog-title"><a href="/news/2020/02/11/release-1.10.0.html">Apache Flink 1.10.0 Release Announcement</a></h2> - - <p>11 Feb 2020 - Marta Paes (<a href="https://twitter.com/morsapaes">@morsapaes</a>)</p> - - <p><p>The Apache Flink community is excited to hit the double digits and announce the release of Flink 1.10.0! As a result of the biggest community effort to date, with over 1.2k issues implemented and more than 200 contributors, this release introduces significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and great advances in Python support (PyFlink).</p> - -</p> - - <p><a href="/news/2020/02/11/release-1.10.0.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -401,6 +401,16 @@ This release marks a big milestone: Stateful Functions 2.0 is not only an API up <ul id="markdown-toc"> + <li><a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></li> + + + + + + + + + <li><a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></li> diff --git a/content/img/blog/2022-06-17-adaptive-batch-scheduler/1-overall-structure.png b/content/img/blog/2022-06-17-adaptive-batch-scheduler/1-overall-structure.png new file mode 100644 index 000000000..f638525fd Binary files /dev/null and b/content/img/blog/2022-06-17-adaptive-batch-scheduler/1-overall-structure.png differ diff --git a/content/img/blog/2022-06-17-adaptive-batch-scheduler/2-dynamic-graph.png b/content/img/blog/2022-06-17-adaptive-batch-scheduler/2-dynamic-graph.png new file mode 100644 index 000000000..38f5c1fb2 Binary files /dev/null and b/content/img/blog/2022-06-17-adaptive-batch-scheduler/2-dynamic-graph.png differ diff --git a/content/img/blog/2022-06-17-adaptive-batch-scheduler/3-static-graph-subpartition-mapping.png b/content/img/blog/2022-06-17-adaptive-batch-scheduler/3-static-graph-subpartition-mapping.png new file mode 100644 index 000000000..53367fb86 Binary files /dev/null and b/content/img/blog/2022-06-17-adaptive-batch-scheduler/3-static-graph-subpartition-mapping.png differ diff --git a/content/img/blog/2022-06-17-adaptive-batch-scheduler/4-dynamic-graph-subpartition-mapping.png b/content/img/blog/2022-06-17-adaptive-batch-scheduler/4-dynamic-graph-subpartition-mapping.png new file mode 100644 index 000000000..264b9677b Binary files /dev/null and b/content/img/blog/2022-06-17-adaptive-batch-scheduler/4-dynamic-graph-subpartition-mapping.png differ diff --git a/content/img/blog/2022-06-17-adaptive-batch-scheduler/5-auto-rebalance.png b/content/img/blog/2022-06-17-adaptive-batch-scheduler/5-auto-rebalance.png new file mode 100644 index 000000000..b8d6cdebd Binary files /dev/null and b/content/img/blog/2022-06-17-adaptive-batch-scheduler/5-auto-rebalance.png differ diff --git a/content/img/blog/2022-06-17-adaptive-batch-scheduler/parallelism-formula.png b/content/img/blog/2022-06-17-adaptive-batch-scheduler/parallelism-formula.png new file mode 100644 index 000000000..122893224 Binary files /dev/null and b/content/img/blog/2022-06-17-adaptive-batch-scheduler/parallelism-formula.png differ diff --git a/content/img/blog/2022-06-17-adaptive-batch-scheduler/range-formula.png b/content/img/blog/2022-06-17-adaptive-batch-scheduler/range-formula.png new file mode 100644 index 000000000..004bcd6eb Binary files /dev/null and b/content/img/blog/2022-06-17-adaptive-batch-scheduler/range-formula.png differ diff --git a/content/index.html b/content/index.html index 4d14abd38..128c43b96 100644 --- a/content/index.html +++ b/content/index.html @@ -397,6 +397,9 @@ <dl> + <dt> <a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></dt> + <dd>To automatically decide parallelism of Flink batch jobs, we introduced adaptive batch scheduler in Flink 1.15. In this post, we'll take a close look at the design & implementation details.</dd> + <dt> <a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></dt> <dd><p>In the last two months since our <a href="https://flink.apache.org/news/2022/04/03/release-kubernetes-operator-0.1.0.html">initial preview release</a> the community has been hard at work to stabilize and improve the core Flink Kubernetes Operator logic. We are now proud to announce the first production ready release of the operator project.</p> @@ -411,12 +414,6 @@ We are now proud to announce the first production ready release of the operator <dt> <a href="/2022/05/18/latency-part1.html">Getting into Low-Latency Gears with Apache Flink - Part One</a></dt> <dd>This multi-part series of blog post presents a collection of low-latency techniques in Flink. Part one starts with types of latency in Flink and the way we measure the end-to-end latency, followed by a few techniques that optimize latency directly.</dd> - - <dt> <a href="/news/2022/05/11/release-table-store-0.1.0.html">Apache Flink Table Store 0.1.0 Release Announcement</a></dt> - <dd><p>The Apache Flink community is pleased to announce the preview release of the -<a href="https://github.com/apache/flink-table-store">Apache Flink Table Store</a> (0.1.0).</p> - -</dd> </dl> diff --git a/content/zh/index.html b/content/zh/index.html index e9f171325..1c692067b 100644 --- a/content/zh/index.html +++ b/content/zh/index.html @@ -394,6 +394,9 @@ <dl> + <dt> <a href="/2022/06/17/adaptive-batch-scheduler.html">Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs</a></dt> + <dd>To automatically decide parallelism of Flink batch jobs, we introduced adaptive batch scheduler in Flink 1.15. In this post, we'll take a close look at the design & implementation details.</dd> + <dt> <a href="/news/2022/06/05/release-kubernetes-operator-1.0.0.html">Apache Flink Kubernetes Operator 1.0.0 Release Announcement</a></dt> <dd><p>In the last two months since our <a href="https://flink.apache.org/news/2022/04/03/release-kubernetes-operator-0.1.0.html">initial preview release</a> the community has been hard at work to stabilize and improve the core Flink Kubernetes Operator logic. We are now proud to announce the first production ready release of the operator project.</p> @@ -408,12 +411,6 @@ We are now proud to announce the first production ready release of the operator <dt> <a href="/2022/05/18/latency-part1.html">Getting into Low-Latency Gears with Apache Flink - Part One</a></dt> <dd>This multi-part series of blog post presents a collection of low-latency techniques in Flink. Part one starts with types of latency in Flink and the way we measure the end-to-end latency, followed by a few techniques that optimize latency directly.</dd> - - <dt> <a href="/news/2022/05/11/release-table-store-0.1.0.html">Apache Flink Table Store 0.1.0 Release Announcement</a></dt> - <dd><p>The Apache Flink community is pleased to announce the preview release of the -<a href="https://github.com/apache/flink-table-store">Apache Flink Table Store</a> (0.1.0).</p> - -</dd> </dl>
