This is an automated email from the ASF dual-hosted git repository. rmetzger pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/flink-web.git
commit 3aa2ae34c8ef82272711245484e885ec21728fcc Author: Robert Metzger <[email protected]> AuthorDate: Tue Dec 15 14:39:00 2020 +0100 Regenerate website This closes #398 --- content/2020/12/15/pipelined-region-sheduling.html | 557 +++++++++++++++++++++ content/blog/feed.xml | 391 ++++++++++++--- content/blog/index.html | 36 +- content/blog/page10/index.html | 40 +- content/blog/page11/index.html | 40 +- content/blog/page12/index.html | 43 +- content/blog/page13/index.html | 43 +- content/blog/page14/index.html | 25 + content/blog/page2/index.html | 36 +- content/blog/page3/index.html | 36 +- content/blog/page4/index.html | 38 +- content/blog/page5/index.html | 41 +- content/blog/page6/index.html | 39 +- content/blog/page7/index.html | 38 +- content/blog/page8/index.html | 40 +- content/blog/page9/index.html | 40 +- .../batch-job-example.png | Bin 0 -> 25335 bytes .../pipelined-regions.png | Bin 0 -> 25161 bytes .../sql-join-job-example.png | Bin 0 -> 10355 bytes .../streaming-job-example.png | Bin 0 -> 7974 bytes content/index.html | 12 +- content/news/2020/12/10/release-1.12.0.html | 2 +- content/zh/index.html | 12 +- 23 files changed, 1224 insertions(+), 285 deletions(-) diff --git a/content/2020/12/15/pipelined-region-sheduling.html b/content/2020/12/15/pipelined-region-sheduling.html new file mode 100644 index 0000000..706265c --- /dev/null +++ b/content/2020/12/15/pipelined-region-sheduling.html @@ -0,0 +1,557 @@ +<!DOCTYPE html> +<html lang="en"> + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> + <title>Apache Flink: Improvements in task scheduling for batch workloads in Apache Flink 1.12</title> + <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"> + <link rel="icon" href="/favicon.ico" type="image/x-icon"> + + <!-- Bootstrap --> + <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css"> + <link rel="stylesheet" href="/css/flink.css"> + <link rel="stylesheet" href="/css/syntax.css"> + + <!-- Blog RSS feed --> + <link href="/blog/feed.xml" rel="alternate" type="application/rss+xml" title="Apache Flink Blog: RSS feed" /> + + <!-- jQuery (necessary for Bootstrap's JavaScript plugins) --> + <!-- We need to load Jquery in the header for custom google analytics event tracking--> + <script src="/js/jquery.min.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!-- WARNING: Respond.js doesn't work if you view the page via file:// --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + </head> + <body> + + + <!-- Main content. --> + <div class="container"> + <div class="row"> + + + <div id="sidebar" class="col-sm-3"> + + +<!-- Top navbar. --> + <nav class="navbar navbar-default"> + <!-- The logo. --> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <div class="navbar-logo"> + <a href="/"> + <img alt="Apache Flink" src="/img/flink-header-logo.svg" width="147px" height="73px"> + </a> + </div> + </div><!-- /.navbar-header --> + + <!-- The navigation links. --> + <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1"> + <ul class="nav navbar-nav navbar-main"> + + <!-- First menu section explains visitors what Flink is --> + + <!-- What is Stream Processing? --> + <!-- + <li><a href="/streamprocessing1.html">What is Stream Processing?</a></li> + --> + + <!-- What is Flink? --> + <li><a href="/flink-architecture.html">What is Apache Flink?</a></li> + + + + <!-- What is Stateful Functions? --> + + <li><a href="/stateful-functions.html">What is Stateful Functions?</a></li> + + <!-- Use cases --> + <li><a href="/usecases.html">Use Cases</a></li> + + <!-- Powered by --> + <li><a href="/poweredby.html">Powered By</a></li> + + + + <!-- Second menu section aims to support Flink users --> + + <!-- Downloads --> + <li><a href="/downloads.html">Downloads</a></li> + + <!-- Getting Started --> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Getting Started<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.12/getting-started/index.html" target="_blank">With Flink <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/getting-started/project-setup.html" target="_blank">With Flink Stateful Functions <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="/training.html">Training Course</a></li> + </ul> + </li> + + <!-- Documentation --> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Documentation<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.12" target="_blank">Flink 1.12 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://ci.apache.org/projects/flink/flink-docs-master" target="_blank">Flink Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2" target="_blank">Flink Stateful Functions 2.2 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://ci.apache.org/projects/flink/flink-statefun-docs-master" target="_blank">Flink Stateful Functions Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + </ul> + </li> + + <!-- getting help --> + <li><a href="/gettinghelp.html">Getting Help</a></li> + + <!-- Blog --> + <li><a href="/blog/"><b>Flink Blog</b></a></li> + + + <!-- Flink-packages --> + <li> + <a href="https://flink-packages.org" target="_blank">flink-packages.org <small><span class="glyphicon glyphicon-new-window"></span></small></a> + </li> + + + <!-- Third menu section aim to support community and contributors --> + + <!-- Community --> + <li><a href="/community.html">Community & Project Info</a></li> + + <!-- Roadmap --> + <li><a href="/roadmap.html">Roadmap</a></li> + + <!-- Contribute --> + <li><a href="/contributing/how-to-contribute.html">How to Contribute</a></li> + + + <!-- GitHub --> + <li> + <a href="https://github.com/apache/flink" target="_blank">Flink on GitHub <small><span class="glyphicon glyphicon-new-window"></span></small></a> + </li> + + + + <!-- Language Switcher --> + <li> + + + <a href="/zh/2020/12/15/pipelined-region-sheduling.html">中文版</a> + + + </li> + + </ul> + + <ul class="nav navbar-nav navbar-bottom"> + <hr /> + + <!-- Twitter --> + <li><a href="https://twitter.com/apacheflink" target="_blank">@ApacheFlink <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + + <!-- Visualizer --> + <li class=" hidden-md hidden-sm"><a href="/visualizer/" target="_blank">Plan Visualizer <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + + <hr /> + + <li><a href="https://apache.org" target="_blank">Apache Software Foundation <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + + <li> + <style> + .smalllinks:link { + display: inline-block !important; background: none; padding-top: 0px; padding-bottom: 0px; padding-right: 0px; min-width: 75px; + } + </style> + + <a class="smalllinks" href="https://www.apache.org/licenses/" target="_blank">License</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + + <a class="smalllinks" href="https://www.apache.org/security/" target="_blank">Security</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + + <a class="smalllinks" href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Donate</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + + <a class="smalllinks" href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + </li> + + </ul> + </div><!-- /.navbar-collapse --> + </nav> + + </div> + <div class="col-sm-9"> + <div class="row-fluid"> + <div class="col-sm-12"> + <div class="row"> + <h1>Improvements in task scheduling for batch workloads in Apache Flink 1.12</h1> + <p><i></i></p> + + <article> + <p>15 Dec 2020 Andrey Zagrebin </p> + +<p>The Flink community has been working for some time on making Flink a +<a href="https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html">truly unified batch and stream processing system</a>. +Achieving this involves touching a lot of different components of the Flink stack, from the user-facing APIs all the way +to low-level operator processes such as task scheduling. In this blogpost, we’ll take a closer look at how far +the community has come in improving scheduling for batch workloads, why this matters and what you can expect in the +Flink 1.12 release with the new <em>pipelined region scheduler</em>.</p> + +<div class="page-toc"> +<ul id="markdown-toc"> + <li><a href="#towards-unified-scheduling" id="markdown-toc-towards-unified-scheduling">Towards unified scheduling</a> <ul> + <li><a href="#scheduling-strategies-in-flink-before-112" id="markdown-toc-scheduling-strategies-in-flink-before-112">Scheduling Strategies in Flink before 1.12</a></li> + <li><a href="#a-practical-example" id="markdown-toc-a-practical-example">A practical example</a></li> + </ul> + </li> + <li><a href="#the-new-pipelined-region-scheduling" id="markdown-toc-the-new-pipelined-region-scheduling">The new pipelined region scheduling</a> <ul> + <li><a href="#pipelined-regions" id="markdown-toc-pipelined-regions">Pipelined regions</a></li> + <li><a href="#pipelined-region-scheduling-strategy" id="markdown-toc-pipelined-region-scheduling-strategy">Pipelined region scheduling strategy</a></li> + <li><a href="#failover-strategy" id="markdown-toc-failover-strategy">Failover strategy</a></li> + <li><a href="#benefits" id="markdown-toc-benefits">Benefits</a></li> + </ul> + </li> + <li><a href="#conclusion" id="markdown-toc-conclusion">Conclusion</a></li> + <li><a href="#appendix" id="markdown-toc-appendix">Appendix</a> <ul> + <li><a href="#what-is-scheduling" id="markdown-toc-what-is-scheduling">What is scheduling?</a> <ul> + <li><a href="#executiongraph" id="markdown-toc-executiongraph">ExecutionGraph</a></li> + <li><a href="#intermediate-results" id="markdown-toc-intermediate-results">Intermediate results</a></li> + <li><a href="#slots-and-resources" id="markdown-toc-slots-and-resources">Slots and resources</a></li> + <li><a href="#scheduling-strategy" id="markdown-toc-scheduling-strategy">Scheduling strategy</a></li> + </ul> + </li> + </ul> + </li> +</ul> + +</div> + +<h1 id="towards-unified-scheduling">Towards unified scheduling</h1> + +<p>Flink has an internal <a href="#what-is-scheduling">scheduler</a> to distribute work to all available cluster nodes, taking resource utilization, state locality and recovery into account. +How do you write a scheduler for a unified batch and streaming system? To answer this question, +let’s first have a look into the high-level differences between batch and streaming scheduling requirements.</p> + +<h4 id="streaming">Streaming</h4> + +<p><em>Streaming</em> jobs usually require that all <em><a href="#executiongraph">operator subtasks</a></em> are running in parallel at the same time, for an indefinite time. +Therefore, all the required resources to run these jobs have to be provided upfront, and all <em>operator subtasks</em> must be deployed at once.</p> + +<center> +<img src="/img/blog/2020-12-02-pipelined-region-sheduling/streaming-job-example.png" width="400px" alt="Streaming job example:high" /> +<br /> +<i><small>Flink: Streaming job example</small></i> +</center> +<p><br /></p> + +<p>Because there are no finite intermediate results, a <em>streaming job</em> always has to be restarted fully from a checkpoint or a savepoint in case of failure.</p> + +<div class="alert alert-info"> + <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span> +A <em>streaming job</em> may generally consist of multiple disjoint pipelines which can be restarted independently. +Hence, the full job restart is not required in this case but you can think of each disjoint pipeline as if it were a separate job.</p> +</div> + +<h4 id="batch">Batch</h4> + +<p>In contrast to <em>streaming</em> jobs, <em>batch</em> jobs usually consist of one or more stages that can have dependencies between them. +Each stage will only run for a finite amount of time and produce some finite output (i.e. at some point, the batch job will be <em>finished</em>). +Independent stages can run in parallel to improve execution time, but for cases where there are dependencies between stages, +a stage may have to wait for upstream results to be produced before it can run. +These are called <em><a href="#intermediate-results">blocking results</a></em>, and in this case stages cannot run in parallel.</p> + +<center> +<img src="/img/blog/2020-12-02-pipelined-region-sheduling/batch-job-example.png" width="600px" alt="Batch job example:high" /> +<br /> +<i><small>Flink: Batch job example</small></i> +</center> +<p><br /></p> + +<p>As an example, in the figure above <strong>Stage 0</strong> and <strong>Stage 1</strong> can run simultaneously, as there is no dependency between them. +<strong>Stage 3</strong>, on the other hand, can only be scheduled once both its inputs are available. There are a few implications from this:</p> + +<ul> + <li> + <p><strong>(a)</strong> You can use available resources more efficiently by only scheduling stages that have data to perform work;</p> + </li> + <li> + <p><strong>(b)</strong> You can use this mechanism also for failover: if a stage fails, it can be restarted individually, without recomputing the results of other stages.</p> + </li> +</ul> + +<h3 id="scheduling-strategies-in-flink-before-112">Scheduling Strategies in Flink before 1.12</h3> + +<p>Given these differences, a unified scheduler would have to be good at resource management for each individual stage, +be it finite (<em>batch</em>) or infinite (<em>streaming</em>), and also across multiple stages. +The existing <a href="#scheduling-strategy">scheduling strategies</a> in older Flink versions up to 1.11 have been largely designed to address these concerns separately.</p> + +<p><strong>“All at once (Eager)”</strong></p> + +<p>This strategy is the simplest: Flink just tries to allocate resources and deploy all <em>subtasks</em> at once. +Up to Flink 1.11, this is the scheduling strategy used for all <em>streaming</em> jobs. +For <em>batch</em> jobs, using “all at once” scheduling would lead to suboptimal resource utilization, +since it’s unlikely that such jobs would require all resources upfront, and any resources allocated to subtasks +that could not run at a given moment would be idle and therefore wasted.</p> + +<p><strong>“Lazy from sources”</strong></p> + +<p>To account for <em>blocking results</em> and make sure that no consumer is deployed before their respective producers are finished, +Flink provides a different scheduling strategy for <em>batch</em> workloads. +“Lazy from sources” scheduling deploys subtasks only once all their inputs are ready. +This strategy operates on each <em>subtask</em> individually; it does not identify all <em>subtasks</em> which can (or have to) run at the same time.</p> + +<h3 id="a-practical-example">A practical example</h3> + +<p>Let’s take a closer look at the specific case of <em>batch</em> jobs, using as motivation a simple SQL query:</p> + +<div class="highlight"><pre><code class="language-SQL"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">customers</span> <span class="p">(</span> + <span class="n">customerId</span> <span class="nb">int</span><span class="p">,</span> + <span class="n">name</span> <span class="nb">varchar</span><span class="p">(</span><span class="mi">255</span><span class="p">)</span> +<span class="p">);</span> + +<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">orders</span> <span class="p">(</span> + <span class="n">orderId</span> <span class="nb">int</span><span class="p">,</span> + <span class="n">orderCustomerId</span> <span class="nb">int</span> +<span class="p">);</span> + +<span class="c1">--fill tables with data</span> + +<span class="k">SELECT</span> <span class="n">customerId</span><span class="p">,</span> <span class="n">name</span> +<span class="k">FROM</span> <span class="n">customers</span><span class="p">,</span> <span class="n">orders</span> +<span class="k">WHERE</span> <span class="n">customerId</span> <span class="o">=</span> <span class="n">orderCustomerId</span></code></pre></div> + +<p>Assume that two tables were created in some database: the <code>customers</code> table is relatively small and fits into the local memory (or also on disk). The <code>orders</code> table is bigger, as it contains all orders created by customers, and doesn’t fit in memory. To enrich the orders with the customer name, you have to join these two tables. There are basically two stages in this <em>batch</em> job:</p> + +<ol> + <li>Load the complete <code>customers</code> table into a local map: <code>(customerId, name)</code>; because this table is smaller,</li> + <li>Process the <code>orders</code> table record by record, enriching it with the <code>name</code> value from the map.</li> +</ol> + +<h4 id="executing-the-job">Executing the job</h4> + +<p>The batch job described above will have three operators. For simplicity, each operator is represented with a parallelism of 1, +so the resulting <em><a href="#executiongraph">ExecutionGraph</a></em> will consist of three <em>subtasks</em>: A, B and C.</p> + +<ul> + <li><strong>A</strong>: load full <code>customers</code> table</li> + <li><strong>B</strong>: load <code>orders</code> table record by record in a <em>streaming</em> (pipelined) fashion</li> + <li><strong>C</strong>: join order table records with the loaded customer table</li> +</ul> + +<p>This translates into <strong>A</strong> and <strong>C</strong> being connected with a <em>blocking</em> data exchange, +because the <code>customers</code> table needs to be loaded locally (<strong>A</strong>) before we start processing the order table (<strong>B</strong>). +<strong>B</strong> and <strong>C</strong> are connected with a <em><a href="#intermediate-results">pipelined</a></em> data exchange, +because the consumer (<strong>C</strong>) can run as soon as the first result records from <strong>B</strong> have been produced. +You can think of <strong>B->C</strong> as a <em>finite streaming</em> job. It’s then possible to identify two separate stages within the <em>ExecutionGraph</em>: <strong>A</strong> and <strong>B->C</strong>.</p> + +<center> +<img src="/img/blog/2020-12-02-pipelined-region-sheduling/sql-join-job-example.png" width="450px" alt="SQL Join job example:high" /> +<br /> +<i><small>Flink: SQL Join job example</small></i> +</center> +<p><br /></p> + +<h4 id="scheduling-limitations">Scheduling Limitations</h4> + +<p>Imagine that the cluster this job will run in has only one <em><a href="#slots-and-resources">slot</a></em> and can therefore only execute one <em>subtask</em>. +If Flink deploys <strong>B</strong> <em><a href="#slots-and-resources">chained</a></em> with <strong>C</strong> first into this one <em>slot</em> (as <strong>B</strong> and <strong>C</strong> are connected with a <em><a href="#intermediate-results">pipelined</a></em> edge), +<strong>C</strong> cannot run because A has not produced its <em>blocking result</em> yet. Flink will try to deploy <strong>A</strong> and the job will fail, because there are no more <em>slots</em>. +If there were two <em>slots</em> available, Flink would be able to deploy <strong>A</strong> and the job would eventually succeed. +Nonetheless, the resources of the first <em>slot</em> occupied by <strong>B</strong> and <strong>C</strong> would be wasted while <strong>A</strong> was running.</p> + +<p>Both scheduling strategies available as of Flink 1.11 (<em>“all at once”</em> and <em>“lazy from source”</em>) would be affected by these limitations. +What would be the optimal approach? In this case, if <strong>A</strong> was deployed first, then <strong>B</strong> and <strong>C</strong> could also complete afterwards using the same <em>slot</em>. +The job would succeed even if only a single <em>slot</em> was available.</p> + +<div class="alert alert-info"> + <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span> +If we could load the <code>orders</code> table into local memory (making B -> C blocking), then the previous strategy would also succeed with one slot. +Nonetheless, we would have to allocate a lot of resources to accommodate the table locally, which may not be required.</p> +</div> + +<p>Last but not least, let’s consider what happens in the case of <em>failover</em>: if the processing of the <code>orders</code> table fails (<strong>B->C</strong>), +then we do not have to reload the customer table (<strong>A</strong>); we only need to restart <strong>B->C</strong>. This did not work prior to Flink 1.9.</p> + +<p>To satisfy the scheduling requirements for <em>batch</em> and <em>streaming</em> and overcome these limitations, +the Flink community has worked on a new unified scheduling and failover strategy that is suitable for both types of workloads: <em>pipelined region scheduling</em>.</p> + +<h1 id="the-new-pipelined-region-scheduling">The new pipelined region scheduling</h1> + +<p>As you read in the previous introductory sections, an optimal <a href="#what-is-scheduling">scheduler</a> should efficiently allocate resources +for the sub-stages of the pipeline, finite or infinite, running in a <em>streaming</em> fashion. Those stages are called <em>pipelined regions</em> in Flink. +In this section, we will take a deeper dive into <em>pipelined region scheduling and failover</em>.</p> + +<h2 id="pipelined-regions">Pipelined regions</h2> + +<p>The new scheduling strategy analyses the <em><a href="#executiongraph">ExecutionGraph</a></em> before starting the <em>subtask</em> deployment in order to identify its <em>pipelined regions</em>. +A <em>pipelined region</em> is a subset of <em>subtasks</em> in the <em>ExecutionGraph</em> connected by <em><a href="#intermediate-results">pipelined</a></em> data exchanges. +<em>Subtasks</em> from different <em>pipelined regions</em> are connected only by <em><a href="#intermediate-results">blocking</a></em> data exchanges. +The depicted example of an <em>ExecutionGraph</em> has four <em>pipelined regions</em> and <em>subtasks</em>, A to H:</p> + +<center> +<img src="/img/blog/2020-12-02-pipelined-region-sheduling/pipelined-regions.png" width="250px" alt="Pipelined regions:high" /> +<br /> +<i><small>Flink: Pipelined regions</small></i> +</center> +<p><br /></p> + +<p>Why do we need the <em>pipelined region</em>? Within the <em>pipelined region</em> all consumers have to constantly consume the produced results +to not block the producers and avoid backpressure. Hence, all <em>subtasks</em> of a <em>pipelined region</em> have to be scheduled, restarted in case of failure and run at the same time.</p> + +<div class="alert alert-info"> + <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note (out of scope)</span> +In certain cases the <em>subtasks</em> can be connected by <em><a href="#intermediate-results">blocking</a></em> data exchanges within one region. +Check <a href="https://issues.apache.org/jira/browse/FLINK-17330">FLINK-17330</a> for details.</p> +</div> + +<h2 id="pipelined-region-scheduling-strategy">Pipelined region scheduling strategy</h2> + +<p>Once the <em>pipelined regions</em> are identified, each region is scheduled only when all the regions it depends on (i.e. its inputs), +have produced their <em><a href="#intermediate-results">blocking</a></em> results (for the depicted graph: R2 and R3 after R1; R4 after R2 and R3). +If the <em>JobManager</em> has enough resources available, it will try to run as many schedulable <em>pipelined regions</em> in parallel as possible. +The <em>subtasks</em> of a <em>pipelined region</em> are either successfully deployed all at once or none at all. +The job fails if there are not enough resources to run any of its <em>pipelined regions</em>. +You can read more about this effort in the original <a href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling#FLIP119PipelinedRegionScheduling-BulkSlotAllocation">FLIP-119 proposal</a>.</p> + +<h2 id="failover-strategy">Failover strategy</h2> + +<p>As mentioned before, only certain regions are running at the same time. Others have already produced their <em><a href="#intermediate-results">blocking</a></em> results. +The results are stored locally in <em>TaskManagers</em> where the corresponding <em>subtasks</em> run. +If a currently running region fails, it gets restarted to consume its inputs again. +If some input results got lost (e.g. the hosting <em>TaskManager</em> failed as well), Flink will rerun their producing regions. +You can read more about this effort in the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/task_failure_recovery.html#failover-strategies">user documentation</a> +and the original <a href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures">FLIP-1 proposal</a>.</p> + +<h2 id="benefits">Benefits</h2> + +<p><strong>Run any batch job, possibly with limited resources</strong></p> + +<p>The <em>subtasks</em> of a <em>pipelined region</em> are deployed only when all necessary conditions for their success are fulfilled: +inputs are ready and all needed resources are allocated. Hence, the <em>batch</em> job never gets stuck without notifying the user. +The job either eventually finishes or fails after a timeout.</p> + +<p>Depending on how the <em>subtasks</em> are allowed to <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/#task-chaining-and-resource-groups">share slots</a>, +it is often the case that the whole <em>pipelined region</em> can run within one <em>slot</em>, +making it generally possible to run the whole <em>batch</em> job with only a single <em>slot</em>. +At the same time, if the cluster provides more resources, Flink will run as many regions as possible in parallel to improve the overall job performance.</p> + +<p><strong>No resource waste</strong></p> + +<p>As mentioned in the definition of <em>pipelined region</em>, all its <em>subtasks</em> have to run simultaneously. +The <em>subtasks</em> of other regions either cannot or do not have to run at the same time. +This means that a <em>pipelined region</em> is the minimum subgraph of a <em>batch</em> job’s <em>ExecutionGraph</em> that has to be scheduled at once. +There is no way to run the job with fewer resources than needed to run the largest region, and so there can be no resource waste.</p> + +<div class="alert alert-info"> + <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note (out of scope)</span> +The amount of resources required to run a region can be further optimized separately. +It depends on <em>co-location constraints</em> and <em>slot sharing groups</em> of the region’s <em>subtasks</em>. +Check <a href="https://issues.apache.org/jira/browse/FLINK-18689">FLINK-18689</a> for details.</p> +</div> + +<h1 id="conclusion">Conclusion</h1> + +<p>Scheduling is a fundamental component of the Flink stack. In this blogpost, we recapped how scheduling affects resource utilization and failover as a part of the user experience. +We described the limitations of Flink’s old scheduler and introduced a new approach to tackle them: the <em>pipelined region scheduler</em>, which ships with Flink 1.12. +The blogpost also explained how <em>pipelined region failover</em> (introduced in Flink 1.11) works.</p> + +<p>Stay tuned for more improvements to scheduling in upcoming releases. If you have any suggestions or questions for the community, +we encourage you to sign up to the Apache Flink <a href="https://flink.apache.org/community.html#mailing-lists">mailing lists</a> and become part of the discussion.</p> + +<h1 id="appendix">Appendix</h1> + +<h2 id="what-is-scheduling">What is scheduling?</h2> + +<h3 id="executiongraph">ExecutionGraph</h3> + +<p>A Flink <em>job</em> is a pipeline of connected <em>operators</em> to process data. +Together, the operators form a <em><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/internals/job_scheduling.html#jobmanager-data-structures">JobGraph</a></em>. +Each <em>operator</em> has a certain number of <em>subtasks</em> executed in parallel. The <em>subtask</em> is the actual execution unit in Flink. +Each subtask can consume user records from other subtasks (inputs), process them and produce records for further consumption by other <em>subtasks</em> (outputs) down the stream. +There are <em>source subtasks</em> without inputs and <em>sink subtasks</em> without outputs. Hence, the <em>subtasks</em> form the nodes of the +<em><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/internals/job_scheduling.html#jobmanager-data-structures">ExecutionGraph</a></em>.</p> + +<h3 id="intermediate-results">Intermediate results</h3> + +<p>There are also two major data-exchange types to produce and consume results by <em>operators</em> and their <em>subtasks</em>: <em>pipelined</em> and <em>blocking</em>. +They are basically types of edges in the <em>ExecutionGraph</em>.</p> + +<p>A <em>pipelined</em> result can be consumed record by record. This means that the consumer can already run once the first result records have been produced. +A <em>pipelined</em> result can be a never ending output of records, e.g. in case of a <em>streaming job</em>.</p> + +<p>A <em>blocking</em> result can be consumed only when its <em>production</em> is done. Hence, the <em>blocking</em> result is always finite +and the consumer of the <em>blocking</em> result can run only when the producer has finished its execution.</p> + +<h3 id="slots-and-resources">Slots and resources</h3> + +<p>A <em><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html#anatomy-of-a-flink-cluster">TaskManager</a></em> +instance has a certain number of virtual <em><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html#task-slots-and-resources">slots</a></em>. +Each <em>slot</em> represents a certain part of the <em>TaskManager’s physical resources</em> to run the operator <em>subtasks</em>, and each <em>subtask</em> is deployed into a <em>slot</em> of the <em>TaskManager</em>. +A <em>slot</em> can run multiple <em><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/internals/job_scheduling.html#scheduling">subtasks</a></em> from different <em>operators</em> at the same time, usually <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html#tasks-and-operator-chains">chained</a> together.</p> + +<h3 id="scheduling-strategy">Scheduling strategy</h3> + +<p><em><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/internals/job_scheduling.html#scheduling">Scheduling</a></em> +in Flink is a process of searching for and allocating appropriate resources (<em>slots</em>) from the <em>TaskManagers</em> to run the <em>subtasks</em> and produce results. +The <em>scheduling strategy</em> reacts on scheduling events (like start job, <em>subtask</em> failed or finished etc) to decide which <em>subtask</em> to deploy next.</p> + +<p>For instance, it does not make sense to schedule <em>subtasks</em> whose inputs are not ready to consume yet to avoid wasting resources. +Another example is to schedule <em>subtasks</em> which are connected with <em>pipelined</em> edges together, to avoid deadlocks caused by backpressure.</p> + + </article> + </div> + + <div class="row"> + <div id="disqus_thread"></div> + <script type="text/javascript"> + /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ + var disqus_shortname = 'stratosphere-eu'; // required: replace example with your forum shortname + + /* * * DON'T EDIT BELOW THIS LINE * * */ + (function() { + var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; + dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; + (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); + })(); + </script> + </div> + </div> +</div> + </div> + </div> + + <hr /> + + <div class="row"> + <div class="footer text-center col-sm-12"> + <p>Copyright © 2014-2019 <a href="http://apache.org">The Apache Software Foundation</a>. All Rights Reserved.</p> + <p>Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.</p> + <p><a href="/privacy-policy.html">Privacy Policy</a> · <a href="/blog/feed.xml">RSS feed</a></p> + </div> + </div> + </div><!-- /.container --> + + <!-- Include all compiled plugins (below), or include individual files as needed --> + <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script> + <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.matchHeight/0.7.0/jquery.matchHeight-min.js"></script> + <script src="/js/codetabs.js"></script> + <script src="/js/stickysidebar.js"></script> + + <!-- Google Analytics --> + <script> + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + + ga('create', 'UA-52545728-1', 'auto'); + ga('send', 'pageview'); + </script> + </body> +</html> diff --git a/content/blog/feed.xml b/content/blog/feed.xml index a3fd369..60b20cf 100644 --- a/content/blog/feed.xml +++ b/content/blog/feed.xml @@ -7,6 +7,321 @@ <atom:link href="https://flink.apache.org/blog/feed.xml" rel="self" type="application/rss+xml" /> <item> +<title>Improvements in task scheduling for batch workloads in Apache Flink 1.12</title> +<description><p>The Flink community has been working for some time on making Flink a +<a href="https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html">truly unified batch and stream processing system</a>. +Achieving this involves touching a lot of different components of the Flink stack, from the user-facing APIs all the way +to low-level operator processes such as task scheduling. In this blogpost, we’ll take a closer look at how far +the community has come in improving scheduling for batch workloads, why this matters and what you can expect in the +Flink 1.12 release with the new <em>pipelined region scheduler</em>.</p> + +<div class="page-toc"> +<ul id="markdown-toc"> + <li><a href="#towards-unified-scheduling" id="markdown-toc-towards-unified-scheduling">Towards unified scheduling</a> <ul> + <li><a href="#scheduling-strategies-in-flink-before-112" id="markdown-toc-scheduling-strategies-in-flink-before-112">Scheduling Strategies in Flink before 1.12</a></li> + <li><a href="#a-practical-example" id="markdown-toc-a-practical-example">A practical example</a></li> + </ul> + </li> + <li><a href="#the-new-pipelined-region-scheduling" id="markdown-toc-the-new-pipelined-region-scheduling">The new pipelined region scheduling</a> <ul> + <li><a href="#pipelined-regions" id="markdown-toc-pipelined-regions">Pipelined regions</a></li> + <li><a href="#pipelined-region-scheduling-strategy" id="markdown-toc-pipelined-region-scheduling-strategy">Pipelined region scheduling strategy</a></li> + <li><a href="#failover-strategy" id="markdown-toc-failover-strategy">Failover strategy</a></li> + <li><a href="#benefits" id="markdown-toc-benefits">Benefits</a></li> + </ul> + </li> + <li><a href="#conclusion" id="markdown-toc-conclusion">Conclusion</a></li> + <li><a href="#appendix" id="markdown-toc-appendix">Appendix</a> <ul> + <li><a href="#what-is-scheduling" id="markdown-toc-what-is-scheduling">What is scheduling?</a> <ul> + <li><a href="#executiongraph" id="markdown-toc-executiongraph">ExecutionGraph</a></li> + <li><a href="#intermediate-results" id="markdown-toc-intermediate-results">Intermediate results</a></li> + <li><a href="#slots-and-resources" id="markdown-toc-slots-and-resources">Slots and resources</a></li> + <li><a href="#scheduling-strategy" id="markdown-toc-scheduling-strategy">Scheduling strategy</a></li> + </ul> + </li> + </ul> + </li> +</ul> + +</div> + +<h1 id="towards-unified-scheduling">Towards unified scheduling</h1> + +<p>Flink has an internal <a href="#what-is-scheduling">scheduler</a> to distribute work to all available cluster nodes, taking resource utilization, state locality and recovery into account. +How do you write a scheduler for a unified batch and streaming system? To answer this question, +let’s first have a look into the high-level differences between batch and streaming scheduling requirements.</p> + +<h4 id="streaming">Streaming</h4> + +<p><em>Streaming</em> jobs usually require that all <em><a href="#executiongraph">operator subtasks</a></em> are running in parallel at the same time, for an indefinite time. +Therefore, all the required resources to run these jobs have to be provided upfront, and all <em>operator subtasks</em> must be deployed at once.</p> + +<center> +<img src="/img/blog/2020-12-02-pipelined-region-sheduling/streaming-job-example.png" width="400px" alt="Streaming job example:high" /> +<br /> +<i><small>Flink: Streaming job example</small></i> +</center> +<p><br /></p> + +<p>Because there are no finite intermediate results, a <em>streaming job</em> always has to be restarted fully from a checkpoint or a savepoint in case of failure.</p> + +<div class="alert alert-info"> + <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span> +A <em>streaming job</em> may generally consist of multiple disjoint pipelines which can be restarted independently. +Hence, the full job restart is not required in this case but you can think of each disjoint pipeline as if it were a separate job.</p> +</div> + +<h4 id="batch">Batch</h4> + +<p>In contrast to <em>streaming</em> jobs, <em>batch</em> jobs usually consist of one or more stages that can have dependencies between them. +Each stage will only run for a finite amount of time and produce some finite output (i.e. at some point, the batch job will be <em>finished</em>). +Independent stages can run in parallel to improve execution time, but for cases where there are dependencies between stages, +a stage may have to wait for upstream results to be produced before it can run. +These are called <em><a href="#intermediate-results">blocking results</a></em>, and in this case stages cannot run in parallel.</p> + +<center> +<img src="/img/blog/2020-12-02-pipelined-region-sheduling/batch-job-example.png" width="600px" alt="Batch job example:high" /> +<br /> +<i><small>Flink: Batch job example</small></i> +</center> +<p><br /></p> + +<p>As an example, in the figure above <strong>Stage 0</strong> and <strong>Stage 1</strong> can run simultaneously, as there is no dependency between them. +<strong>Stage 3</strong>, on the other hand, can only be scheduled once both its inputs are available. There are a few implications from this:</p> + +<ul> + <li> + <p><strong>(a)</strong> You can use available resources more efficiently by only scheduling stages that have data to perform work;</p> + </li> + <li> + <p><strong>(b)</strong> You can use this mechanism also for failover: if a stage fails, it can be restarted individually, without recomputing the results of other stages.</p> + </li> +</ul> + +<h3 id="scheduling-strategies-in-flink-before-112">Scheduling Strategies in Flink before 1.12</h3> + +<p>Given these differences, a unified scheduler would have to be good at resource management for each individual stage, +be it finite (<em>batch</em>) or infinite (<em>streaming</em>), and also across multiple stages. +The existing <a href="#scheduling-strategy">scheduling strategies</a> in older Flink versions up to 1.11 have been largely designed to address these concerns separately.</p> + +<p><strong>“All at once (Eager)”</strong></p> + +<p>This strategy is the simplest: Flink just tries to allocate resources and deploy all <em>subtasks</em> at once. +Up to Flink 1.11, this is the scheduling strategy used for all <em>streaming</em> jobs. +For <em>batch</em> jobs, using “all at once” scheduling would lead to suboptimal resource utilization, +since it’s unlikely that such jobs would require all resources upfront, and any resources allocated to subtasks +that could not run at a given moment would be idle and therefore wasted.</p> + +<p><strong>“Lazy from sources”</strong></p> + +<p>To account for <em>blocking results</em> and make sure that no consumer is deployed before their respective producers are finished, +Flink provides a different scheduling strategy for <em>batch</em> workloads. +“Lazy from sources” scheduling deploys subtasks only once all their inputs are ready. +This strategy operates on each <em>subtask</em> individually; it does not identify all <em>subtasks</em> which can (or have to) run at the same time.</p> + +<h3 id="a-practical-example">A practical example</h3> + +<p>Let’s take a closer look at the specific case of <em>batch</em> jobs, using as motivation a simple SQL query:</p> + +<div class="highlight"><pre><code class="language-SQL"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">customers</span> <span class="p">(</span> + <span class="n">customerId</span> <span class="nb">int</span><span class="p">,</span> + <span class="n">name</span> <span class="nb">varchar</span><span class="p">(</span><span class="mi">255</span><span class="p">)</span> +<span class="p">);</span> + +<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">orders</span> <span class="p">(</span> + <span class="n">orderId</span> <span class="nb">int</span><span class="p">,</span> + <span class="n">orderCustomerId</span> <span class="nb">int</span> +<span class="p">);</span> + +<span class="c1">--fill tables with data</span> + +<span class="k">SELECT</span> <span class="n">customerId</span><span class="p">,</span> <span class="n">name</span> +<span class="k">FROM</span> <span class="n">customers</span><span class="p">,</span> <span class="n">orders</span> +<span class="k">WHERE</span> <span class="n">customerId</span> <span class="o">=</span> <span class="n">orderCustomerId</span></code></pre></div> + +<p>Assume that two tables were created in some database: the <code>customers</code> table is relatively small and fits into the local memory (or also on disk). The <code>orders</code> table is bigger, as it contains all orders created by customers, and doesn’t fit in memory. To enrich the orders with the customer name, you have to join these two tables. There are basically two stages in this <em>batch</em> job:</p> + +<ol> + <li>Load the complete <code>customers</code> table into a local map: <code>(customerId, name)</code>; because this table is smaller,</li> + <li>Process the <code>orders</code> table record by record, enriching it with the <code>name</code> value from the map.</li> +</ol> + +<h4 id="executing-the-job">Executing the job</h4> + +<p>The batch job described above will have three operators. For simplicity, each operator is represented with a parallelism of 1, +so the resulting <em><a href="#executiongraph">ExecutionGraph</a></em> will consist of three <em>subtasks</em>: A, B and C.</p> + +<ul> + <li><strong>A</strong>: load full <code>customers</code> table</li> + <li><strong>B</strong>: load <code>orders</code> table record by record in a <em>streaming</em> (pipelined) fashion</li> + <li><strong>C</strong>: join order table records with the loaded customer table</li> +</ul> + +<p>This translates into <strong>A</strong> and <strong>C</strong> being connected with a <em>blocking</em> data exchange, +because the <code>customers</code> table needs to be loaded locally (<strong>A</strong>) before we start processing the order table (<strong>B</strong>). +<strong>B</strong> and <strong>C</strong> are connected with a <em><a href="#intermediate-results">pipelined</a></em> data exchange, +because the consumer (<strong>C</strong>) can run as soon as the first result records from <strong>B</strong> have been produced. +You can think of <strong>B-&gt;C</strong> as a <em>finite streaming</em> job. It’s then possible to identify two separate stages within the <em>ExecutionGraph</em>: <strong>A</strong> and <strong>B-&gt;C</strong>.</p> + +<center> +<img src="/img/blog/2020-12-02-pipelined-region-sheduling/sql-join-job-example.png" width="450px" alt="SQL Join job example:high" /> +<br /> +<i><small>Flink: SQL Join job example</small></i> +</center> +<p><br /></p> + +<h4 id="scheduling-limitations">Scheduling Limitations</h4> + +<p>Imagine that the cluster this job will run in has only one <em><a href="#slots-and-resources">slot</a></em> and can therefore only execute one <em>subtask</em>. +If Flink deploys <strong>B</strong> <em><a href="#slots-and-resources">chained</a></em> with <strong>C</strong> first into this one <em>slot</em> (as <strong>B</strong> and <strong>C</strong> are connected with a <em><a href="#intermediate-results">pipelined</a></em> edge), +<strong>C</strong> cannot run because A has not produced its <em>blocking result</em> yet. Flink will try to deploy <strong>A</strong> and the job will fail, because there are no more <em>slots</em>. +If there were two <em>slots</em> available, Flink would be able to deploy <strong>A</strong> and the job would eventually succeed. +Nonetheless, the resources of the first <em>slot</em> occupied by <strong>B</strong> and <strong>C</strong> would be wasted while <strong>A</strong> was running.</p> + +<p>Both scheduling strategies available as of Flink 1.11 (<em>“all at once”</em> and <em>“lazy from source”</em>) would be affected by these limitations. +What would be the optimal approach? In this case, if <strong>A</strong> was deployed first, then <strong>B</strong> and <strong>C</strong> could also complete afterwards using the same <em>slot</em>. +The job would succeed even if only a single <em>slot</em> was available.</p> + +<div class="alert alert-info"> + <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span> +If we could load the <code>orders</code> table into local memory (making B -&gt; C blocking), then the previous strategy would also succeed with one slot. +Nonetheless, we would have to allocate a lot of resources to accommodate the table locally, which may not be required.</p> +</div> + +<p>Last but not least, let’s consider what happens in the case of <em>failover</em>: if the processing of the <code>orders</code> table fails (<strong>B-&gt;C</strong>), +then we do not have to reload the customer table (<strong>A</strong>); we only need to restart <strong>B-&gt;C</strong>. This did not work prior to Flink 1.9.</p> + +<p>To satisfy the scheduling requirements for <em>batch</em> and <em>streaming</em> and overcome these limitations, +the Flink community has worked on a new unified scheduling and failover strategy that is suitable for both types of workloads: <em>pipelined region scheduling</em>.</p> + +<h1 id="the-new-pipelined-region-scheduling">The new pipelined region scheduling</h1> + +<p>As you read in the previous introductory sections, an optimal <a href="#what-is-scheduling">scheduler</a> should efficiently allocate resources +for the sub-stages of the pipeline, finite or infinite, running in a <em>streaming</em> fashion. Those stages are called <em>pipelined regions</em> in Flink. +In this section, we will take a deeper dive into <em>pipelined region scheduling and failover</em>.</p> + +<h2 id="pipelined-regions">Pipelined regions</h2> + +<p>The new scheduling strategy analyses the <em><a href="#executiongraph">ExecutionGraph</a></em> before starting the <em>subtask</em> deployment in order to identify its <em>pipelined regions</em>. +A <em>pipelined region</em> is a subset of <em>subtasks</em> in the <em>ExecutionGraph</em> connected by <em><a href="#intermediate-results">pipelined</a></em> data exchanges. +<em>Subtasks</em> from different <em>pipelined regions</em> are connected only by <em><a href="#intermediate-results">blocking</a></em> data exchanges. +The depicted example of an <em>ExecutionGraph</em> has four <em>pipelined regions</em> and <em>subtasks</em>, A to H:</p> + +<center> +<img src="/img/blog/2020-12-02-pipelined-region-sheduling/pipelined-regions.png" width="250px" alt="Pipelined regions:high" /> +<br /> +<i><small>Flink: Pipelined regions</small></i> +</center> +<p><br /></p> + +<p>Why do we need the <em>pipelined region</em>? Within the <em>pipelined region</em> all consumers have to constantly consume the produced results +to not block the producers and avoid backpressure. Hence, all <em>subtasks</em> of a <em>pipelined region</em> have to be scheduled, restarted in case of failure and run at the same time.</p> + +<div class="alert alert-info"> + <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note (out of scope)</span> +In certain cases the <em>subtasks</em> can be connected by <em><a href="#intermediate-results">blocking</a></em> data exchanges within one region. +Check <a href="https://issues.apache.org/jira/browse/FLINK-17330">FLINK-17330</a> for details.</p> +</div> + +<h2 id="pipelined-region-scheduling-strategy">Pipelined region scheduling strategy</h2> + +<p>Once the <em>pipelined regions</em> are identified, each region is scheduled only when all the regions it depends on (i.e. its inputs), +have produced their <em><a href="#intermediate-results">blocking</a></em> results (for the depicted graph: R2 and R3 after R1; R4 after R2 and R3). +If the <em>JobManager</em> has enough resources available, it will try to run as many schedulable <em>pipelined regions</em> in parallel as possible. +The <em>subtasks</em> of a <em>pipelined region</em> are either successfully deployed all at once or none at all. +The job fails if there are not enough resources to run any of its <em>pipelined regions</em>. +You can read more about this effort in the original <a href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling#FLIP119PipelinedRegionScheduling-BulkSlotAllocation">FLIP-119 proposal</a>.</p> + +<h2 id="failover-strategy">Failover strategy</h2> + +<p>As mentioned before, only certain regions are running at the same time. Others have already produced their <em><a href="#intermediate-results">blocking</a></em> results. +The results are stored locally in <em>TaskManagers</em> where the corresponding <em>subtasks</em> run. +If a currently running region fails, it gets restarted to consume its inputs again. +If some input results got lost (e.g. the hosting <em>TaskManager</em> failed as well), Flink will rerun their producing regions. +You can read more about this effort in the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/task_failure_recovery.html#failover-strategies">user documentation</a> +and the original <a href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures">FLIP-1 proposal</a>.</p> + +<h2 id="benefits">Benefits</h2> + +<p><strong>Run any batch job, possibly with limited resources</strong></p> + +<p>The <em>subtasks</em> of a <em>pipelined region</em> are deployed only when all necessary conditions for their success are fulfilled: +inputs are ready and all needed resources are allocated. Hence, the <em>batch</em> job never gets stuck without notifying the user. +The job either eventually finishes or fails after a timeout.</p> + +<p>Depending on how the <em>subtasks</em> are allowed to <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/#task-chaining-and-resource-groups">share slots</a>, +it is often the case that the whole <em>pipelined region</em> can run within one <em>slot</em>, +making it generally possible to run the whole <em>batch</em> job with only a single <em>slot</em>. +At the same time, if the cluster provides more resources, Flink will run as many regions as possible in parallel to improve the overall job performance.</p> + +<p><strong>No resource waste</strong></p> + +<p>As mentioned in the definition of <em>pipelined region</em>, all its <em>subtasks</em> have to run simultaneously. +The <em>subtasks</em> of other regions either cannot or do not have to run at the same time. +This means that a <em>pipelined region</em> is the minimum subgraph of a <em>batch</em> job’s <em>ExecutionGraph</em> that has to be scheduled at once. +There is no way to run the job with fewer resources than needed to run the largest region, and so there can be no resource waste.</p> + +<div class="alert alert-info"> + <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note (out of scope)</span> +The amount of resources required to run a region can be further optimized separately. +It depends on <em>co-location constraints</em> and <em>slot sharing groups</em> of the region’s <em>subtasks</em>. +Check <a href="https://issues.apache.org/jira/browse/FLINK-18689">FLINK-18689</a> for details.</p> +</div> + +<h1 id="conclusion">Conclusion</h1> + +<p>Scheduling is a fundamental component of the Flink stack. In this blogpost, we recapped how scheduling affects resource utilization and failover as a part of the user experience. +We described the limitations of Flink’s old scheduler and introduced a new approach to tackle them: the <em>pipelined region scheduler</em>, which ships with Flink 1.12. +The blogpost also explained how <em>pipelined region failover</em> (introduced in Flink 1.11) works.</p> + +<p>Stay tuned for more improvements to scheduling in upcoming releases. If you have any suggestions or questions for the community, +we encourage you to sign up to the Apache Flink <a href="https://flink.apache.org/community.html#mailing-lists">mailing lists</a> and become part of the discussion.</p> + +<h1 id="appendix">Appendix</h1> + +<h2 id="what-is-scheduling">What is scheduling?</h2> + +<h3 id="executiongraph">ExecutionGraph</h3> + +<p>A Flink <em>job</em> is a pipeline of connected <em>operators</em> to process data. +Together, the operators form a <em><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/internals/job_scheduling.html#jobmanager-data-structures">JobGraph</a></em>. +Each <em>operator</em> has a certain number of <em>subtasks</em> executed in parallel. The <em>subtask</em> is the actual execution unit in Flink. +Each subtask can consume user records from other subtasks (inputs), process them and produce records for further consumption by other <em>subtasks</em> (outputs) down the stream. +There are <em>source subtasks</em> without inputs and <em>sink subtasks</em> without outputs. Hence, the <em>subtasks</em> form the nodes of the +<em><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/internals/job_scheduling.html#jobmanager-data-structures">ExecutionGraph</a></em>.</p> + +<h3 id="intermediate-results">Intermediate results</h3> + +<p>There are also two major data-exchange types to produce and consume results by <em>operators</em> and their <em>subtasks</em>: <em>pipelined</em> and <em>blocking</em>. +They are basically types of edges in the <em>ExecutionGraph</em>.</p> + +<p>A <em>pipelined</em> result can be consumed record by record. This means that the consumer can already run once the first result records have been produced. +A <em>pipelined</em> result can be a never ending output of records, e.g. in case of a <em>streaming job</em>.</p> + +<p>A <em>blocking</em> result can be consumed only when its <em>production</em> is done. Hence, the <em>blocking</em> result is always finite +and the consumer of the <em>blocking</em> result can run only when the producer has finished its execution.</p> + +<h3 id="slots-and-resources">Slots and resources</h3> + +<p>A <em><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html#anatomy-of-a-flink-cluster">TaskManager</a></em> +instance has a certain number of virtual <em><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html#task-slots-and-resources">slots</a></em>. +Each <em>slot</em> represents a certain part of the <em>TaskManager’s physical resources</em> to run the operator <em>subtasks</em>, and each <em>subtask</em> is deployed into a <em>slot</em> of the <em>TaskManager</em>. +A <em>slot</em> can run multiple <em><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/internals/job_scheduling.html#scheduling">subtasks</a></em> from different <em>operators</em> at the same time, usually <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html#tasks-and-operator-chains">chained</a> together.</p> + +<h3 id="scheduling-strategy">Scheduling strategy</h3> + +<p><em><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/internals/job_scheduling.html#scheduling">Scheduling</a></em> +in Flink is a process of searching for and allocating appropriate resources (<em>slots</em>) from the <em>TaskManagers</em> to run the <em>subtasks</em> and produce results. +The <em>scheduling strategy</em> reacts on scheduling events (like start job, <em>subtask</em> failed or finished etc) to decide which <em>subtask</em> to deploy next.</p> + +<p>For instance, it does not make sense to schedule <em>subtasks</em> whose inputs are not ready to consume yet to avoid wasting resources. +Another example is to schedule <em>subtasks</em> which are connected with <em>pipelined</em> edges together, to avoid deadlocks caused by backpressure.</p> +</description> +<pubDate>Tue, 15 Dec 2020 09:00:00 +0100</pubDate> +<link>https://flink.apache.org/2020/12/15/pipelined-region-sheduling.html</link> +<guid isPermaLink="true">/2020/12/15/pipelined-region-sheduling.html</guid> +</item> + +<item> <title>Apache Flink 1.12.0 Release Announcement</title> <description><p>The Apache Flink community is excited to announce the release of Flink 1.12.0! Close to 300 contributors worked on over 1k threads to bring significant improvements to usability as well as new features that simplify (and unify) Flink handling across the API stack.</p> @@ -357,7 +672,7 @@ With the new release, Flink SQL supports <strong>metadata columns</stro <h2 id="release-notes">Release Notes</h2> -<p>Please review the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.12/release-notes/flink-1.12.html">release notes</a> carefully for a detailed list of changes and new features if you plan to upgrade your setup to Flink 1.11. This version is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation.</p> +<p>Please review the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.12/release-notes/flink-1.12.html">release notes</a> carefully for a detailed list of changes and new features if you plan to upgrade your setup to Flink 1.12. This version is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation.</p> <h2 id="list-of-contributors">List of Contributors</h2> @@ -18296,79 +18611,5 @@ enable the joining of a main, high-throughput stream with one more more inputs w <guid isPermaLink="true">/news/2016/10/12/release-1.1.3.html</guid> </item> -<item> -<title>Apache Flink 1.1.2 Released</title> -<description><p>The Apache Flink community released another bugfix version of the Apache Flink 1.1. series.</p> - -<p>We recommend all users to upgrade to Flink 1.1.2.</p> - -<div class="highlight"><pre><code class="language-xml"><span class="nt">&lt;dependency&gt;</span> - <span class="nt">&lt;groupId&gt;</span>org.apache.flink<span class="nt">&lt;/groupId&gt;</span> - <span class="nt">&lt;artifactId&gt;</span>flink-java<span class="nt">&lt;/artifactId&gt;</span> - <span class="nt">&lt;version&gt;</span>1.1.2<span class="nt">&lt;/version&gt;</span> -<span class="nt">&lt;/dependency&gt;</span> -<span class="nt">&lt;dependency&gt;</span> - <span class="nt">&lt;groupId&gt;</span>org.apache.flink<span class="nt">&lt;/groupId&gt;</span> - <span class="nt">&lt;artifactId&gt;</span>flink-streaming-java_2.10<span class="nt">&lt;/artifactId&gt;</span> - <span class="nt">&lt;version&gt;</span>1.1.2<span class="nt">&lt;/version&gt;</span> -<span class="nt">&lt;/dependency&gt;</span> -<span class="nt">&lt;dependency&gt;</span> - <span class="nt">&lt;groupId&gt;</span>org.apache.flink<span class="nt">&lt;/groupId&gt;</span> - <span class="nt">&lt;artifactId&gt;</span>flink-clients_2.10<span class="nt">&lt;/artifactId&gt;</span> - <span class="nt">&lt;version&gt;</span>1.1.2<span class="nt">&lt;/version&gt;</span> -<span class="nt">&lt;/dependency&gt;</span></code></pre></div> - -<p>You can find the binaries on the updated <a href="http://flink.apache.org/downloads.html">Downloads page</a>.</p> - -<h2>Release Notes - Flink - Version 1.1.2</h2> - -<ul> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4236">FLINK-4236</a>] - Flink Dashboard stops showing list of uploaded jars if main method cannot be looked up -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4309">FLINK-4309</a>] - Potential null pointer dereference in DelegatingConfiguration#keySet() -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4334">FLINK-4334</a>] - Shaded Hadoop1 jar not fully excluded in Quickstart -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4341">FLINK-4341</a>] - Kinesis connector does not emit maximum watermark properly -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4402">FLINK-4402</a>] - Wrong metrics parameter names in documentation -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4409">FLINK-4409</a>] - class conflict between jsr305-1.3.9.jar and flink-shaded-hadoop2-1.1.1.jar -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4411">FLINK-4411</a>] - [py] Chained dual input children are not properly propagated -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4412">FLINK-4412</a>] - [py] Chaining does not properly handle broadcast variables -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4425">FLINK-4425</a>] - &quot;Out Of Memory&quot; during savepoint deserialization -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4454">FLINK-4454</a>] - Lookups for JobManager address in config -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4480">FLINK-4480</a>] - Incorrect link to elastic.co in documentation -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4486">FLINK-4486</a>] - JobManager not fully running when yarn-session.sh finishes -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4488">FLINK-4488</a>] - Prevent cluster shutdown after job execution for non-detached jobs -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4514">FLINK-4514</a>] - ExpiredIteratorException in Kinesis Consumer on long catch-ups to head of stream -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4526">FLINK-4526</a>] - ApplicationClient: remove redundant proxy messages -</li> - -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-3866">FLINK-3866</a>] - StringArraySerializer claims type is immutable; shouldn&#39;t -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-3899">FLINK-3899</a>] - Document window processing with Reduce/FoldFunction + WindowFunction -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4302">FLINK-4302</a>] - Add JavaDocs to MetricConfig -</li> -<li>[<a href="https://issues.apache.org/jira/browse/FLINK-4495">FLINK-4495</a>] - Running multiple jobs on yarn (without yarn-session) -</li> -</ul> - -</description> -<pubDate>Mon, 05 Sep 2016 11:00:00 +0200</pubDate> -<link>https://flink.apache.org/news/2016/09/05/release-1.1.2.html</link> -<guid isPermaLink="true">/news/2016/09/05/release-1.1.2.html</guid> -</item> - </channel> </rss> diff --git a/content/blog/index.html b/content/blog/index.html index 72656d4..1522487 100644 --- a/content/blog/index.html +++ b/content/blog/index.html @@ -196,6 +196,19 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></h2> + + <p>15 Dec 2020 + Andrey Zagrebin </p> + + <p>In this blogpost, we’ll take a closer look at how far the community has come in improving task scheduling for batch workloads, why this matters and what you can expect in Flink 1.12 with the new pipelined region scheduler.</p> + + <p><a href="/2020/12/15/pipelined-region-sheduling.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></h2> <p>10 Dec 2020 @@ -324,19 +337,6 @@ as well as increased observability for operational purposes.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2020/08/20/flink-docker.html">The State of Flink on Docker</a></h2> - - <p>20 Aug 2020 - Robert Metzger (<a href="https://twitter.com/rmetzger_">@rmetzger_</a>)</p> - - <p>This blog post gives an update on the recent developments of Flink's support for Docker.</p> - - <p><a href="/news/2020/08/20/flink-docker.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -369,6 +369,16 @@ as well as increased observability for operational purposes.</p> <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page10/index.html b/content/blog/page10/index.html index fd56621..5436d2b 100644 --- a/content/blog/page10/index.html +++ b/content/blog/page10/index.html @@ -196,6 +196,21 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2017/06/01/release-1.3.0.html">Apache Flink 1.3.0 Release Announcement</a></h2> + + <p>01 Jun 2017 by Robert Metzger (<a href="https://twitter.com/">@rmetzger_</a>) + </p> + + <p><p>The Apache Flink community is pleased to announce the 1.3.0 release. Over the past 4 months, the Flink community has been working hard to resolve more than 680 issues. See the <a href="/blog/release_1.3.0-changelog.html">complete changelog</a> for more detail.</p> + +</p> + + <p><a href="/news/2017/06/01/release-1.3.0.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2017/05/16/official-docker-image.html">Introducing Docker Images for Apache Flink</a></h2> <p>16 May 2017 by Patrick Lucas (Data Artisans) and Ismaël Mejía (Talend) (<a href="https://twitter.com/">@iemejia</a>) @@ -323,21 +338,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2016/09/05/release-1.1.2.html">Apache Flink 1.1.2 Released</a></h2> - - <p>05 Sep 2016 - </p> - - <p><p>The Apache Flink community released another bugfix version of the Apache Flink 1.1. series.</p> - -</p> - - <p><a href="/news/2016/09/05/release-1.1.2.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -370,6 +370,16 @@ <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page11/index.html b/content/blog/page11/index.html index 9276073..b0bb834 100644 --- a/content/blog/page11/index.html +++ b/content/blog/page11/index.html @@ -196,6 +196,21 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2016/09/05/release-1.1.2.html">Apache Flink 1.1.2 Released</a></h2> + + <p>05 Sep 2016 + </p> + + <p><p>The Apache Flink community released another bugfix version of the Apache Flink 1.1. series.</p> + +</p> + + <p><a href="/news/2016/09/05/release-1.1.2.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2016/08/24/ff16-keynotes-panels.html">Flink Forward 2016: Announcing Schedule, Keynotes, and Panel Discussion</a></h2> <p>24 Aug 2016 @@ -327,21 +342,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2016/03/08/release-1.0.0.html">Announcing Apache Flink 1.0.0</a></h2> - - <p>08 Mar 2016 - </p> - - <p><p>The Apache Flink community is pleased to announce the availability of the 1.0.0 release. The community put significant effort into improving and extending Apache Flink since the last release, focusing on improving the experience of writing and executing data stream processing pipelines in production.</p> - -</p> - - <p><a href="/news/2016/03/08/release-1.0.0.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -374,6 +374,16 @@ <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page12/index.html b/content/blog/page12/index.html index 25b674a..03c8cbd 100644 --- a/content/blog/page12/index.html +++ b/content/blog/page12/index.html @@ -196,6 +196,21 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2016/03/08/release-1.0.0.html">Announcing Apache Flink 1.0.0</a></h2> + + <p>08 Mar 2016 + </p> + + <p><p>The Apache Flink community is pleased to announce the availability of the 1.0.0 release. The community put significant effort into improving and extending Apache Flink since the last release, focusing on improving the experience of writing and executing data stream processing pipelines in production.</p> + +</p> + + <p><a href="/news/2016/03/08/release-1.0.0.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2016/02/11/release-0.10.2.html">Flink 0.10.2 Released</a></h2> <p>11 Feb 2016 @@ -328,24 +343,6 @@ Apache Flink started.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2015/08/24/introducing-flink-gelly.html">Introducing Gelly: Graph Processing with Apache Flink</a></h2> - - <p>24 Aug 2015 - </p> - - <p><p>This blog post introduces <strong>Gelly</strong>, Apache Flink’s <em>graph-processing API and library</em>. Flink’s native support -for iterations makes it a suitable platform for large-scale graph analytics. -By leveraging delta iterations, Gelly is able to map various graph processing models such as -vertex-centric or gather-sum-apply to Flink dataflows.</p> - -</p> - - <p><a href="/news/2015/08/24/introducing-flink-gelly.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -378,6 +375,16 @@ vertex-centric or gather-sum-apply to Flink dataflows.</p> <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page13/index.html b/content/blog/page13/index.html index 27e902f..ca8fb67 100644 --- a/content/blog/page13/index.html +++ b/content/blog/page13/index.html @@ -196,6 +196,24 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2015/08/24/introducing-flink-gelly.html">Introducing Gelly: Graph Processing with Apache Flink</a></h2> + + <p>24 Aug 2015 + </p> + + <p><p>This blog post introduces <strong>Gelly</strong>, Apache Flink’s <em>graph-processing API and library</em>. Flink’s native support +for iterations makes it a suitable platform for large-scale graph analytics. +By leveraging delta iterations, Gelly is able to map various graph processing models such as +vertex-centric or gather-sum-apply to Flink dataflows.</p> + +</p> + + <p><a href="/news/2015/08/24/introducing-flink-gelly.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2015/06/24/announcing-apache-flink-0.9.0-release.html">Announcing Apache Flink 0.9.0</a></h2> <p>24 Jun 2015 @@ -337,21 +355,6 @@ and offers a new API including definition of flexible windows.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2015/01/21/release-0.8.html">Apache Flink 0.8.0 available</a></h2> - - <p>21 Jan 2015 - </p> - - <p><p>We are pleased to announce the availability of Flink 0.8.0. This release includes new user-facing features as well as performance and bug fixes, extends the support for filesystems and introduces the Scala API and flexible windowing semantics for Flink Streaming. A total of 33 people have contributed to this release, a big thanks to all of them!</p> - -</p> - - <p><a href="/news/2015/01/21/release-0.8.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -384,6 +387,16 @@ and offers a new API including definition of flexible windows.</p> <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page14/index.html b/content/blog/page14/index.html index d1a4bfa..15dbfea 100644 --- a/content/blog/page14/index.html +++ b/content/blog/page14/index.html @@ -196,6 +196,21 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2015/01/21/release-0.8.html">Apache Flink 0.8.0 available</a></h2> + + <p>21 Jan 2015 + </p> + + <p><p>We are pleased to announce the availability of Flink 0.8.0. This release includes new user-facing features as well as performance and bug fixes, extends the support for filesystems and introduces the Scala API and flexible windowing semantics for Flink Streaming. A total of 33 people have contributed to this release, a big thanks to all of them!</p> + +</p> + + <p><a href="/news/2015/01/21/release-0.8.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2015/01/06/december-in-flink.html">December 2014 in the Flink community</a></h2> <p>06 Jan 2015 @@ -320,6 +335,16 @@ academic and open source project that Flink originates from.</p> <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page2/index.html b/content/blog/page2/index.html index 00d184b..b143efd 100644 --- a/content/blog/page2/index.html +++ b/content/blog/page2/index.html @@ -196,6 +196,19 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2020/08/20/flink-docker.html">The State of Flink on Docker</a></h2> + + <p>20 Aug 2020 + Robert Metzger (<a href="https://twitter.com/rmetzger_">@rmetzger_</a>)</p> + + <p>This blog post gives an update on the recent developments of Flink's support for Docker.</p> + + <p><a href="/news/2020/08/20/flink-docker.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/2020/08/19/statefun.html">Monitoring and Controlling Networks of IoT Devices with Flink Stateful Functions</a></h2> <p>19 Aug 2020 @@ -325,19 +338,6 @@ illustrate this trend.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2020/07/06/release-1.11.0.html">Apache Flink 1.11.0 Release Announcement</a></h2> - - <p>06 Jul 2020 - Marta Paes (<a href="https://twitter.com/morsapaes">@morsapaes</a>)</p> - - <p>The Apache Flink community is proud to announce the release of Flink 1.11.0! More than 200 contributors worked on over 1.3k issues to bring significant improvements to usability as well as new features to Flink users across the whole API stack. We're particularly excited about unaligned checkpoints to cope with high backpressure scenarios, a new source API that simplifies and unifies the implementation of (custom) sources, and support for Change Data Capture (CDC) and other comm [...] - - <p><a href="/news/2020/07/06/release-1.11.0.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -370,6 +370,16 @@ illustrate this trend.</p> <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page3/index.html b/content/blog/page3/index.html index 9f4e877..7854b61 100644 --- a/content/blog/page3/index.html +++ b/content/blog/page3/index.html @@ -196,6 +196,19 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2020/07/06/release-1.11.0.html">Apache Flink 1.11.0 Release Announcement</a></h2> + + <p>06 Jul 2020 + Marta Paes (<a href="https://twitter.com/morsapaes">@morsapaes</a>)</p> + + <p>The Apache Flink community is proud to announce the release of Flink 1.11.0! More than 200 contributors worked on over 1.3k issues to bring significant improvements to usability as well as new features to Flink users across the whole API stack. We're particularly excited about unaligned checkpoints to cope with high backpressure scenarios, a new source API that simplifies and unifies the implementation of (custom) sources, and support for Change Data Capture (CDC) and other comm [...] + + <p><a href="/news/2020/07/06/release-1.11.0.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/ecosystem/2020/06/23/flink-on-zeppelin-part2.html">Flink on Zeppelin Notebooks for Interactive Data Analysis - Part 2</a></h2> <p>23 Jun 2020 @@ -325,19 +338,6 @@ and provide a tutorial for running Streaming ETL with Flink on Zeppelin.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2020/04/15/flink-serialization-tuning-vol-1.html">Flink Serialization Tuning Vol. 1: Choosing your Serializer — if you can</a></h2> - - <p>15 Apr 2020 - Nico Kruber </p> - - <p>Serialization is a crucial element of your Flink job. This article is the first in a series of posts that will highlight Flink’s serialization stack, and looks at the different ways Flink can serialize your data types.</p> - - <p><a href="/news/2020/04/15/flink-serialization-tuning-vol-1.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -370,6 +370,16 @@ and provide a tutorial for running Streaming ETL with Flink on Zeppelin.</p> <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page4/index.html b/content/blog/page4/index.html index baa2883..a182454 100644 --- a/content/blog/page4/index.html +++ b/content/blog/page4/index.html @@ -196,6 +196,19 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2020/04/15/flink-serialization-tuning-vol-1.html">Flink Serialization Tuning Vol. 1: Choosing your Serializer — if you can</a></h2> + + <p>15 Apr 2020 + Nico Kruber </p> + + <p>Serialization is a crucial element of your Flink job. This article is the first in a series of posts that will highlight Flink’s serialization stack, and looks at the different ways Flink can serialize your data types.</p> + + <p><a href="/news/2020/04/15/flink-serialization-tuning-vol-1.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/2020/04/09/pyflink-udf-support-flink.html">PyFlink: Introducing Python Support for UDFs in Flink's Table API</a></h2> <p>09 Apr 2020 @@ -319,21 +332,6 @@ This release marks a big milestone: Stateful Functions 2.0 is not only an API up <hr> - <article> - <h2 class="blog-title"><a href="/news/2020/01/30/release-1.9.2.html">Apache Flink 1.9.2 Released</a></h2> - - <p>30 Jan 2020 - Hequn Cheng (<a href="https://twitter.com/HequnC">@HequnC</a>)</p> - - <p><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.9 series.</p> - -</p> - - <p><a href="/news/2020/01/30/release-1.9.2.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -366,6 +364,16 @@ This release marks a big milestone: Stateful Functions 2.0 is not only an API up <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page5/index.html b/content/blog/page5/index.html index bc2713b..eedadd1 100644 --- a/content/blog/page5/index.html +++ b/content/blog/page5/index.html @@ -196,6 +196,21 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2020/01/30/release-1.9.2.html">Apache Flink 1.9.2 Released</a></h2> + + <p>30 Jan 2020 + Hequn Cheng (<a href="https://twitter.com/HequnC">@HequnC</a>)</p> + + <p><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.9 series.</p> + +</p> + + <p><a href="/news/2020/01/30/release-1.9.2.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2020/01/29/state-unlocked-interacting-with-state-in-apache-flink.html">State Unlocked: Interacting with State in Apache Flink</a></h2> <p>29 Jan 2020 @@ -318,22 +333,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2019/08/22/release-1.9.0.html">Apache Flink 1.9.0 Release Announcement</a></h2> - - <p>22 Aug 2019 - </p> - - <p><p>The Apache Flink community is proud to announce the release of Apache Flink -1.9.0.</p> - -</p> - - <p><a href="/news/2019/08/22/release-1.9.0.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -366,6 +365,16 @@ <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page6/index.html b/content/blog/page6/index.html index 2d1b6ee..479929b 100644 --- a/content/blog/page6/index.html +++ b/content/blog/page6/index.html @@ -196,6 +196,22 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2019/08/22/release-1.9.0.html">Apache Flink 1.9.0 Release Announcement</a></h2> + + <p>22 Aug 2019 + </p> + + <p><p>The Apache Flink community is proud to announce the release of Apache Flink +1.9.0.</p> + +</p> + + <p><a href="/news/2019/08/22/release-1.9.0.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></h2> <p>23 Jul 2019 @@ -322,19 +338,6 @@ for more details.</p> <hr> - <article> - <h2 class="blog-title"><a href="/features/2019/03/11/prometheus-monitoring.html">Flink and Prometheus: Cloud-native monitoring of streaming applications</a></h2> - - <p>11 Mar 2019 - Maximilian Bode, TNG Technology Consulting (<a href="https://twitter.com/mxpbode">@mxpbode</a>)</p> - - <p>This blog post describes how developers can leverage Apache Flink's built-in metrics system together with Prometheus to observe and monitor streaming applications in an effective way.</p> - - <p><a href="/features/2019/03/11/prometheus-monitoring.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -367,6 +370,16 @@ for more details.</p> <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page7/index.html b/content/blog/page7/index.html index 42087dd..6c63b0a 100644 --- a/content/blog/page7/index.html +++ b/content/blog/page7/index.html @@ -196,6 +196,19 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/features/2019/03/11/prometheus-monitoring.html">Flink and Prometheus: Cloud-native monitoring of streaming applications</a></h2> + + <p>11 Mar 2019 + Maximilian Bode, TNG Technology Consulting (<a href="https://twitter.com/mxpbode">@mxpbode</a>)</p> + + <p>This blog post describes how developers can leverage Apache Flink's built-in metrics system together with Prometheus to observe and monitor streaming applications in an effective way.</p> + + <p><a href="/features/2019/03/11/prometheus-monitoring.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2019/03/06/ffsf-preview.html">What to expect from Flink Forward San Francisco 2019</a></h2> <p>06 Mar 2019 @@ -326,21 +339,6 @@ Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa <hr> - <article> - <h2 class="blog-title"><a href="/news/2018/10/29/release-1.6.2.html">Apache Flink 1.6.2 Released</a></h2> - - <p>29 Oct 2018 - </p> - - <p><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.6 series.</p> - -</p> - - <p><a href="/news/2018/10/29/release-1.6.2.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -373,6 +371,16 @@ Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page8/index.html b/content/blog/page8/index.html index e59289f..bf6c102 100644 --- a/content/blog/page8/index.html +++ b/content/blog/page8/index.html @@ -196,6 +196,21 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2018/10/29/release-1.6.2.html">Apache Flink 1.6.2 Released</a></h2> + + <p>29 Oct 2018 + </p> + + <p><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.6 series.</p> + +</p> + + <p><a href="/news/2018/10/29/release-1.6.2.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2018/10/29/release-1.5.5.html">Apache Flink 1.5.5 Released</a></h2> <p>29 Oct 2018 @@ -330,21 +345,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2018/03/08/release-1.4.2.html">Apache Flink 1.4.2 Released</a></h2> - - <p>08 Mar 2018 - </p> - - <p><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.4 series.</p> - -</p> - - <p><a href="/news/2018/03/08/release-1.4.2.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -377,6 +377,16 @@ <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/blog/page9/index.html b/content/blog/page9/index.html index 74e3f3a..63b7f81 100644 --- a/content/blog/page9/index.html +++ b/content/blog/page9/index.html @@ -196,6 +196,21 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2018/03/08/release-1.4.2.html">Apache Flink 1.4.2 Released</a></h2> + + <p>08 Mar 2018 + </p> + + <p><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.4 series.</p> + +</p> + + <p><a href="/news/2018/03/08/release-1.4.2.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/features/2018/03/01/end-to-end-exactly-once-apache-flink.html">An Overview of End-to-End Exactly-Once Processing in Apache Flink (with Apache Kafka, too!)</a></h2> <p>01 Mar 2018 @@ -327,21 +342,6 @@ what’s coming in Flink 1.4.0 as well as a preview of what the Flink community <hr> - <article> - <h2 class="blog-title"><a href="/news/2017/06/01/release-1.3.0.html">Apache Flink 1.3.0 Release Announcement</a></h2> - - <p>01 Jun 2017 by Robert Metzger (<a href="https://twitter.com/">@rmetzger_</a>) - </p> - - <p><p>The Apache Flink community is pleased to announce the 1.3.0 release. Over the past 4 months, the Flink community has been working hard to resolve more than 680 issues. See the <a href="/blog/release_1.3.0-changelog.html">complete changelog</a> for more detail.</p> - -</p> - - <p><a href="/news/2017/06/01/release-1.3.0.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -374,6 +374,16 @@ what’s coming in Flink 1.4.0 as well as a preview of what the Flink community <ul id="markdown-toc"> + <li><a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></li> + + + + + + + + + <li><a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></li> diff --git a/content/img/blog/2020-12-02-pipelined-region-sheduling/batch-job-example.png b/content/img/blog/2020-12-02-pipelined-region-sheduling/batch-job-example.png new file mode 100644 index 0000000..2c92ce9 Binary files /dev/null and b/content/img/blog/2020-12-02-pipelined-region-sheduling/batch-job-example.png differ diff --git a/content/img/blog/2020-12-02-pipelined-region-sheduling/pipelined-regions.png b/content/img/blog/2020-12-02-pipelined-region-sheduling/pipelined-regions.png new file mode 100644 index 0000000..0306b7d Binary files /dev/null and b/content/img/blog/2020-12-02-pipelined-region-sheduling/pipelined-regions.png differ diff --git a/content/img/blog/2020-12-02-pipelined-region-sheduling/sql-join-job-example.png b/content/img/blog/2020-12-02-pipelined-region-sheduling/sql-join-job-example.png new file mode 100644 index 0000000..d18f039 Binary files /dev/null and b/content/img/blog/2020-12-02-pipelined-region-sheduling/sql-join-job-example.png differ diff --git a/content/img/blog/2020-12-02-pipelined-region-sheduling/streaming-job-example.png b/content/img/blog/2020-12-02-pipelined-region-sheduling/streaming-job-example.png new file mode 100644 index 0000000..6d93dd3 Binary files /dev/null and b/content/img/blog/2020-12-02-pipelined-region-sheduling/streaming-job-example.png differ diff --git a/content/index.html b/content/index.html index abfcea9..67f9839 100644 --- a/content/index.html +++ b/content/index.html @@ -568,6 +568,9 @@ <dl> + <dt> <a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></dt> + <dd>In this blogpost, we’ll take a closer look at how far the community has come in improving task scheduling for batch workloads, why this matters and what you can expect in Flink 1.12 with the new pipelined region scheduler.</dd> + <dt> <a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></dt> <dd>The Apache Flink community is excited to announce the release of Flink 1.12.0! Close to 300 contributors worked on over 1k threads to bring significant improvements to usability as well as new features to Flink users across the whole API stack. We're particularly excited about adding efficient batch execution to the DataStream API, Kubernetes HA as an alternative to ZooKeeper, support for upsert mode in the Kafka SQL connector and the new Python DataStream API! Read on for al [...] @@ -581,15 +584,6 @@ <dt> <a href="/news/2020/10/13/stateful-serverless-internals.html">Stateful Functions Internals: Behind the scenes of Stateful Serverless</a></dt> <dd>This blog post dives deep into the internals of the StateFun runtime, taking a look at how it enables consistent and fault-tolerant stateful serverless applications.</dd> - - <dt> <a href="/news/2020/09/28/release-statefun-2.2.0.html">Stateful Functions 2.2.0 Release Announcement</a></dt> - <dd><p>The Apache Flink community is happy to announce the release of Stateful Functions (StateFun) 2.2.0! This release -introduces major features that extend the SDKs, such as support for asynchronous functions in the Python SDK, new -persisted state constructs, and a new SDK that allows embedding StateFun functions within a Flink DataStream job. -Moreover, we’ve also included important changes that improve out-of-the-box stability for common workloads, -as well as increased observability for operational purposes.</p> - -</dd> </dl> diff --git a/content/news/2020/12/10/release-1.12.0.html b/content/news/2020/12/10/release-1.12.0.html index 48e7bdb..abba298 100644 --- a/content/news/2020/12/10/release-1.12.0.html +++ b/content/news/2020/12/10/release-1.12.0.html @@ -546,7 +546,7 @@ With the new release, Flink SQL supports <strong>metadata columns</strong> to re <h2 id="release-notes">Release Notes</h2> -<p>Please review the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.12/release-notes/flink-1.12.html">release notes</a> carefully for a detailed list of changes and new features if you plan to upgrade your setup to Flink 1.11. This version is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation.</p> +<p>Please review the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.12/release-notes/flink-1.12.html">release notes</a> carefully for a detailed list of changes and new features if you plan to upgrade your setup to Flink 1.12. This version is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation.</p> <h2 id="list-of-contributors">List of Contributors</h2> diff --git a/content/zh/index.html b/content/zh/index.html index 74ed1b0..48b4bde 100644 --- a/content/zh/index.html +++ b/content/zh/index.html @@ -565,6 +565,9 @@ <dl> + <dt> <a href="/2020/12/15/pipelined-region-sheduling.html">Improvements in task scheduling for batch workloads in Apache Flink 1.12</a></dt> + <dd>In this blogpost, we’ll take a closer look at how far the community has come in improving task scheduling for batch workloads, why this matters and what you can expect in Flink 1.12 with the new pipelined region scheduler.</dd> + <dt> <a href="/news/2020/12/10/release-1.12.0.html">Apache Flink 1.12.0 Release Announcement</a></dt> <dd>The Apache Flink community is excited to announce the release of Flink 1.12.0! Close to 300 contributors worked on over 1k threads to bring significant improvements to usability as well as new features to Flink users across the whole API stack. We're particularly excited about adding efficient batch execution to the DataStream API, Kubernetes HA as an alternative to ZooKeeper, support for upsert mode in the Kafka SQL connector and the new Python DataStream API! Read on for al [...] @@ -578,15 +581,6 @@ <dt> <a href="/news/2020/10/13/stateful-serverless-internals.html">Stateful Functions Internals: Behind the scenes of Stateful Serverless</a></dt> <dd>This blog post dives deep into the internals of the StateFun runtime, taking a look at how it enables consistent and fault-tolerant stateful serverless applications.</dd> - - <dt> <a href="/news/2020/09/28/release-statefun-2.2.0.html">Stateful Functions 2.2.0 Release Announcement</a></dt> - <dd><p>The Apache Flink community is happy to announce the release of Stateful Functions (StateFun) 2.2.0! This release -introduces major features that extend the SDKs, such as support for asynchronous functions in the Python SDK, new -persisted state constructs, and a new SDK that allows embedding StateFun functions within a Flink DataStream job. -Moreover, we’ve also included important changes that improve out-of-the-box stability for common workloads, -as well as increased observability for operational purposes.</p> - -</dd> </dl>
