This is an automated email from the ASF dual-hosted git repository. sewen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/flink-web.git
commit 5f57faf97c072653cb45bc8214e95ea7380ce41c Author: Stephan Ewen <[email protected]> AuthorDate: Mon Mar 15 18:50:00 2021 +0100 Update Roadmap --- img/flink_feature_radar.svg | 298 ++++++++++++++++++++++++++++++++++++++++++ img/flink_feature_radar_2.svg | 3 + roadmap.md | 280 ++++++++++++++++++++++++++------------- 3 files changed, 492 insertions(+), 89 deletions(-) diff --git a/img/flink_feature_radar.svg b/img/flink_feature_radar.svg new file mode 100644 index 0000000..52ac28b --- /dev/null +++ b/img/flink_feature_radar.svg @@ -0,0 +1,298 @@ +<?xml version="1.0" encoding="utf-8"?> +<!-- Generator: Adobe Illustrator 25.2.0, SVG Export Plug-In . SVG Version: 6.00 Build 0) --> +<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" + viewBox="0 0 1462 1611" style="enable-background:new 0 0 1462 1611;" xml:space="preserve"> +<style type="text/css"> + .st0{opacity:0.5;fill:#F2F2F2;enable-background:new ;} + .st1{fill:none;} + .st2{fill:#363636;} + .st3{font-family:'Trenda-Bold';} + .st4{font-size:31px;} + .st5{opacity:0.5;fill:none;stroke:#363636;stroke-miterlimit:10;stroke-dasharray:3,3;enable-background:new ;} + .st6{fill:#E8436A;} + .st7{font-size:26px;} + .st8{fill:#4B9654;} + .st9{fill:#2F8DC1;} + .st10{fill:#F9A11B;} + .st11{fill:#993940;} + .st12{fill:#7C56A4;} + .st13{fill:#002FA5;} + .st14{font-size:23px;} + .st15{enable-background:new ;} + .st16{font-size:22.4692px;} + .st17{font-size:24px;} + .st18{font-size:25px;} + .st19{font-size:25.0008px;} + .st20{fill:none;stroke:#363636;stroke-width:9;stroke-miterlimit:10;} + .st21{fill:#B5739D;stroke:#363636;stroke-width:9;stroke-miterlimit:10;} + .st22{font-size:41px;} + .st23{font-family:'ArialMT';} + .st24{font-size:10px;} +</style> +<g> + <rect x="0.5" y="1510.5" pointer-events="all" class="st0" width="1460" height="100"/> + <rect x="1" y="880.3" pointer-events="all" class="st0" width="1460" height="598.3"/> + <rect x="0.5" y="0.5" pointer-events="all" class="st0" width="1460" height="845.4"/> + <rect x="30.5" y="630.5" pointer-events="all" class="st1" width="40" height="20"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 15.3501 651.6816)" class="st2 st3 st4">MVP</text> + </g> + <path pointer-events="stroke" class="st5" d="M781.8,799.7l-555-609.2"/> + <rect x="70.5" y="240.5" pointer-events="all" class="st1" width="70" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 18.0127 136)" class="st2 st3 st4">Beta</text> + </g> + <rect x="288" y="135.5" pointer-events="all" class="st1" width="230" height="90"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 230.7544 136)" class="st2 st3 st4">Production Ready...</text> + </g> + <rect x="966.5" y="76.5" pointer-events="all" class="st1" width="230" height="90"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 862 127)" class="st2 st3 st4">Stable</text> + </g> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1231.0005 1033.2699)" class="st2 st3 st4">Deprecated</text> + </g> + <path pointer-events="stroke" class="st5" d="M817,809.5l-0.5-683"/> + <path pointer-events="stroke" class="st5" d="M666,1402.8l506-370"/> + <rect x="651" y="972.8" pointer-events="all" class="st1" width="400" height="50"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 176.5005 1018.3096)" class="st2 st3 st4">Approaching End-of-Life</text> + </g> + <rect x="70.5" y="1540.5" pointer-events="all" class="st1" width="305" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 72.4224 1575.1875)" class="st6 st3 st7">APIs</text> + </g> + <rect x="168" y="1540.5" pointer-events="all" class="st1" width="120" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1292.4648 1575.1871)" class="st8 st3 st7">Languages</text> + </g> + <rect x="325.5" y="1540.5" pointer-events="all" class="st1" width="90" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 522.017 1575.187)" class="st9 st3 st7">Clients</text> + </g> + <rect x="710.5" y="1540.5" pointer-events="all" class="st1" width="140" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 676.0762 1575.187)" class="st10 st3 st7">Connectors</text> + </g> + <rect x="870.5" y="1540.5" pointer-events="all" class="st1" width="180" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 871.0003 1575.187)" class="st11 st3 st7">State Backends</text> + </g> + <rect x="1085.5" y="1540.5" pointer-events="all" class="st1" width="100" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1121.5 1575.4209)" class="st12 st3 st7">Libraries</text> + </g> + <rect x="450.5" y="1540.5" pointer-events="all" class="st1" width="220" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 188.5 1575.1871)" class="st13 st3 st7">Resource Managers</text> + </g> + <rect x="856.5" y="160.5" pointer-events="all" class="st1" width="305" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 858.5 187.5)" class="st6 st3 st14">DataStream (streaming)</text> + </g> + <rect x="70.5" y="290.5" pointer-events="all" class="st1" width="210" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 15.0259 322)" class="st6 st3 st14">DataStream (batch)</text> + </g> + <rect x="643.5" y="1072.8" pointer-events="all" class="st1" width="100" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 551.5 1083.3936)" class="st6 st3 st14">DataSet</text> + </g> + <rect x="510.5" y="206.5" pointer-events="all" class="st1" width="270" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 580.2861 214.5781)" class="st6 st3 st14">SQL & Table API</text> + </g> + <rect x="851" y="1312.8" pointer-events="all" class="st1" width="270" height="70"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(0.8713 0.5013 -0.5066 0.8622 960.9474 1300.7898)" class="st15"><tspan x="0" y="0" class="st6 st3 st16">Legacy SQL</tspan><tspan x="0" y="27" class="st6 st3 st16">Query Engine</tspan></text> + </g> + <rect x="787" y="1087.8" pointer-events="all" class="st1" width="180" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(0.9824 0.1869 -0.1869 0.9824 791.3521 1093.2705)" class="st6 st3 st14">Queryable State</text> + </g> + <rect x="90.5" y="540.5" pointer-events="all" class="st1" width="210" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 106 544.8164)" class="st6 st3 st14">State Processor API</text> + </g> + <rect x="541" y="1182.8" pointer-events="all" class="st1" width="130" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 371 1192.6338)" class="st9 st3 st17">Scala Shell</text> + </g> + <rect x="1170.5" y="166.5" pointer-events="all" class="st1" width="110" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1171 237.5)" class="st8 st3 st17">Java 8</text> + </g> + <rect x="1170.5" y="206.5" pointer-events="all" class="st1" width="120" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1172 281)" class="st8 st3 st17">Java 11</text> + </g> + <rect x="1171.5" y="261.5" pointer-events="all" class="st1" width="140" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1171 333.1797)" class="st8 st3 st17">Scala 2.12</text> + </g> + <rect x="476" y="1047.8" pointer-events="all" class="st1" width="140" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 306 1083.3945)" class="st8 st3 st17">Scala 2.11</text> + </g> + <rect x="633" y="150.5" pointer-events="all" class="st1" width="135" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 341.6924 211)" class="st13 st3 st18">Kubernetes</text> + </g> + <rect x="856.5" y="210.5" pointer-events="all" class="st1" width="130" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 858.5 237.5)" class="st13 st3 st18">Standalone</text> + </g> + <rect x="856.5" y="256.5" pointer-events="all" class="st1" width="135" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 858.5 283.5)" class="st13 st3 st18">Yarn</text> + </g> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1171 377.5)" class="st13 st3 st18">Zookeeper HA</text> + </g> + <rect x="1062" y="1192.8" pointer-events="all" class="st1" width="135" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(0.92 0.392 -0.392 0.92 1093.6067 1194.7485)" class="st13 st3 st19">Mesos</text> + </g> + <rect x="380.5" y="640.5" pointer-events="all" class="st1" width="220" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 371 659.9396)" class="st15"><tspan x="0" y="0" class="st13 st3 st18">Kubernetes-based HA</tspan><tspan x="0" y="30" class="st13 st3 st18">(ZK-alternative)</tspan></text> + </g> + <rect x="856.5" y="301.5" pointer-events="all" class="st1" width="220" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 858.5 328.5)" class="st11 st3 st17">Heap/FS State Back.</text> + </g> + <rect x="850.5" y="350.5" pointer-events="all" class="st1" width="305" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 858.75 381.1953)" class="st11 st3 st17">RocksDB/FS State Back.</text> + </g> + <rect x="696" y="1222.8" pointer-events="all" class="st1" width="60" height="50"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(0.9409 0.3387 -0.3387 0.9409 712.0695 1219.0122)" class="st12 st3 st17">Gelly</text> + </g> + <rect x="1044" y="240.5" pointer-events="all" class="st1" width="70" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1171 189.5)" class="st12 st3 st17">CEP</text> + </g> + <rect x="50.5" y="680.5" pointer-events="all" class="st1" width="190" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 17.7056 708.66)" class="st12 st3 st17">Machine Learning </text> + <text transform="matrix(1 0 0 1 17.7056 737.46)" class="st12 st3 st17">Library</text> + </g> + <rect x="480.5" y="341.5" pointer-events="all" class="st1" width="135" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 430.1448 351)" class="st10 st3 st17">JDBC Sink</text> + </g> + <rect x="190.5" y="470.5" pointer-events="all" class="st1" width="230" height="70"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 83.8286 500.9998)" class="st10 st3 st17">Unified Source API. [w/ Kafka, File]</text> + </g> + <rect x="871.5" y="460.5" pointer-events="all" class="st1" width="187.5" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 857 488.0273)" class="st10 st3 st17">File Source & Sink</text> + </g> + <rect x="1081.5" y="460.5" pointer-events="all" class="st1" width="305" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1156 488.5547)" class="st10 st3 st17">Kafka Source & Sink</text> + </g> + <rect x="585.5" y="275.5" pointer-events="all" class="st1" width="160" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 586 285.3643)" class="st10 st3 st17">Pulsar Source & Sink</text> + </g> + <rect x="580.5" y="500.5" pointer-events="all" class="st1" width="240" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 580.9999 531)" class="st10 st3 st17">Rabbit MQ Source</text> + </g> + <rect x="525.5" y="400.5" pointer-events="all" class="st1" width="240" height="30"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 502.75 416)" class="st10 st3 st17">Kinesis Source & Sink</text> + </g> + <rect x="871.5" y="510.5" pointer-events="all" class="st1" width="190" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 856.9995 541)" class="st10 st3 st17">PubSub Source</text> + </g> + <rect x="190.5" y="590.5" pointer-events="all" class="st1" width="190" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 201 584.6249)" class="st10 st3 st17">NiFi Source</text> + </g> + <rect x="1081.5" y="510.5" pointer-events="all" class="st1" width="220" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1158.04 537.5)" class="st10 st3 st17">Elastic Search Sink</text> + </g> + <rect x="1081.5" y="560.5" pointer-events="all" class="st1" width="220" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1160.0801 587.7158)" class="st10 st3 st17">Cassandra Sink</text> + </g> + <rect x="861.5" y="560.5" pointer-events="all" class="st1" width="220" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 857.5996 587.9316)" class="st10 st3 st17">HBase Sink</text> + </g> + <rect x="633" y="330.5" pointer-events="all" class="st1" width="135" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 616.5 351.2158)" class="st10 st3 st17">Hive Catalog</text> + </g> + <rect x="390.5" y="251.5" pointer-events="all" class="st1" width="160" height="60"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 381 281)" class="st15"><tspan x="0" y="0" class="st10 st3 st17">Hive SQL.</tspan><tspan x="0" y="28.8" class="st10 st3 st17">Source & Sink</tspan></text> + </g> + <rect x="200.5" y="390.5" pointer-events="all" class="st1" width="190" height="70"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 43.0991 449.1187)" class="st10 st3 st17">Unified Sink API [w/ FileSink]</text> + </g> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 16.9805 270.6396)" class="st8 st3 st17">Python Table API</text> + </g> + <rect x="248" y="730.5" pointer-events="all" class="st1" width="250" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 281.5 791)" class="st8 st3 st17">Python DataStream API</text> + </g> + <rect x="1201.5" y="610.5" pointer-events="all" class="st1" width="180" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 857 695)" class="st10 st3 st17">S3 FileSystem</text> + </g> + <rect x="1075.5" y="674.5" pointer-events="all" class="st1" width="180" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1161.1123 684.5889)" class="st10 st3 st17">GCS FileSystem</text> + </g> + <rect x="856.5" y="610.5" pointer-events="all" class="st1" width="180" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 858.5 637.5)" class="st10 st3 st17">Local/NFS FileSystem</text> + </g> + <rect x="1006.5" y="610.5" pointer-events="all" class="st1" width="180" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 1160.0801 629.8887)" class="st10 st3 st17">HDFS FileSystem</text> + </g> + <rect x="660.5" y="640.5" pointer-events="all" class="st1" width="130" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 676.0044 642.1797)" class="st15"><tspan x="0" y="0" class="st10 st3 st17">Azure Blob</tspan><tspan x="0" y="28.8" class="st10 st3 st17">FileSystem</tspan></text> + </g> + <rect x="633" y="555.5" pointer-events="all" class="st1" width="180" height="50"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 633.5 570.5801)" class="st15"><tspan x="0" y="0" class="st10 st3 st17">AliCloud OSS </tspan><tspan x="0" y="28.8" class="st10 st3 st17">FileSystem</tspan></text> + </g> + <path pointer-events="stroke" class="st20" d="M188,790.5c135-260,523.6-390,1165.7-390"/> + <path pointer-events="all" class="st21" d="M1360.4,400.5l-9,4.5l2.2-4.5l-2.2-4.5L1360.4,400.5z"/> + <path pointer-events="stroke" class="st20" d="M176,1122.8c543.3-13.3,911.4,95.7,1104.2,327.1"/> + <path pointer-events="all" class="st21" d="M1284.6,1455l-9.2-4l4.9-1.2l2-4.6L1284.6,1455z"/> + <rect x="445.5" y="580.5" pointer-events="all" class="st1" width="90" height="40"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 446 611)" class="st9 st3 st17">SQL CLI</text> + </g> + <path pointer-events="stroke" class="st5" d="M711.6,816.7L20.5,580.5"/> + <rect x="10.5" y="0.5" pointer-events="all" class="st1" width="540" height="90"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 11.5 46)" class="st2 st3 st22">New- and Stable Features</text> + </g> + <rect x="11" y="932.8" pointer-events="all" class="st1" width="540" height="90"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 17.7056 926.8354)" class="st2 st3 st22">Features Phasing Out</text> + </g> + <rect x="26.5" y="340.5" pointer-events="all" class="st1" width="274" height="60"/> + <g transform="translate(-0.5 -0.5)"> + <text transform="matrix(1 0 0 1 16.9805 371)" class="st15"><tspan x="0" y="0" class="st10 st3 st17">Change-Data-Capture API and </tspan><tspan x="0" y="28.8" class="st10 st3 st17">connectors</tspan></text> + </g> +</g> +<a xlink:href="https://www.diagrams.net/doc/faq/svg-export-text-problems" transform="translate(0,-5)"> + <text transform="matrix(1 0 0 1 649.6006 1611.5)" class="st23 st24">Viewer does not support full SVG 1.1</text> +</a> +</svg> diff --git a/img/flink_feature_radar_2.svg b/img/flink_feature_radar_2.svg new file mode 100644 index 0000000..39c6d2b --- /dev/null +++ b/img/flink_feature_radar_2.svg @@ -0,0 +1,3 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> +<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="1462px" height="1611px" viewBox="-0.5 -0.5 1462 1611" content="<mxfile host="app.diagrams.net" modified="2021-03-02T12:54:02.361Z" agent="5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36" etag="HUCSSSwWrlf5aUy_RP5Y" version="14.4.3" type="device"><diagram id=& [...] \ No newline at end of file diff --git a/roadmap.md b/roadmap.md index 46a0933..0b8fb90 100644 --- a/roadmap.md +++ b/roadmap.md @@ -24,139 +24,241 @@ under the License. {% toc %} -**Preamble:** This is not an authoritative roadmap in the sense of a strict plan with a specific -timeline. Rather, we — the community — share our vision for the future and give an overview of the bigger -initiatives that are going on and are receiving attention. This roadmap shall give users and -contributors an understanding where the project is going and what they can expect to come. +**Preamble:** This roadmap means to provide user and contributors with a high-level summary of ongoing efforts, +grouped by the major threads to which the efforts belong. With so much that is happening in Flink, we +hope that this helps with understanding the direction of the project. +The roadmap contains both efforts in early stages as well as nearly completed +efforts, so that users may get a better impression of the overall status and direction of those developments. + +More details and various smaller changes can be found in the +[FLIPs](https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals) The roadmap is continuously updated. New features and efforts should be added to the roadmap once there is consensus that they will happen and what they will roughly look like for the user. -**Last Update:** 2019-09-04 +**Last Update:** 2021-03-01 + +<hr /> + +# Feature Radar + +The feature radar is meant to give users guidance regarding feature maturity, as well as which features +are approaching end-of-life. For questions, please contact the developer mailing list: +[[email protected]](mailto:[email protected]) + +<div class="row front-graphic"> + <img src="{{ site.baseurl }}/img/flink_feature_radar_2.svg" width="700px" /> +</div> + +## Feature Stages + + - **MVP:** Have a look, consider whether this can help you in the future. + - **Beta:** You can benefit from this, but you should carefully evaluate the feature. + - **Ready and Evolving:** Ready to use in production, but be aware you may need to make some adjustments to your application and setup in the future, when you upgrade Flink. + - **Stable:** Unrestricted use in production + - **Reaching End-of-Life:** Stable, still feel free to use, but think about alternatives. Not a good match for new long-lived projects. + - **Deprecated:** Start looking for alternatives now + +<hr /> + +# Unified Analytics: Where Batch and Streaming come Together; SQL and beyond. + +Flink is a streaming data system in its core, that executes "batch as a special case of streaming". +Efficient execution of batch jobs is powerful in its own right; but even more so, batch processing +capabilities (efficient processing of bounded streams) open the way for a seamless unification of +batch and streaming applications. + +Unified streaming/batch up-levels the streaming data paradigm: It gives users consistent semantics across +their real-time and lag-time applications. Furthermore, streaming applications often need to be complemented +by batch (bounded stream) processing, for example when reprocessing data after bugs or data quality issues, +or when bootstrapping new applications. A unified API and system make this much easier. + +## A unified SQL Platform + +The community has been building Flink to a powerful basis for a unified (batch and streaming) SQL analytics +platform, and is continuing to do so. + +SQL has very strong cross-batch-streaming semantics, allowing users to use the same queries for ad-hoc analytics +and as continuous queries. Flink already contains an efficient unified query engine, and a wide set of +integrations. With user feedback, those are continuously improved. + +**More Connector and Change Data Capture Support** + + - Change-Data-Capture: Capturing a stream of data changes, directly from databases, by attaching to the + transaction log. The community is adding more CDC intrgrations. + - External CDC connectors: [https://flink-packages.org/packages/cdc-connectors](https://flink-packages.org/packages/cdc-connectors) + - Background: [FLIP-105](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427289) + (CDC support for SQL) and [Debezium](https://debezium.io/). + + - Data Lake Connectors: Unified streaming & batch is a powerful value proposition for Data Lakes: supporting + same APIs, semantics, and engine for streaming real-time processing and batch processing of historic data. + The community is adding deeper integrations with various Data Lake systems: + - [Apache Iceberg](https://iceberg.apache.org/): [https://iceberg.apache.org/flink/](https://iceberg.apache.org/flink/) + - [Apache Hudi](https://hudi.apache.org/): [https://hudi.apache.org/blog/apache-hudi-meets-apache-flink/](https://hudi.apache.org/blog/apache-hudi-meets-apache-flink/) + - [Apache Pinot](https://pinot.apache.org/): [FLIP-166](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177045634) + +**Platform Infrastructure** -# Analytics, Applications, and the roles of DataStream, DataSet, and Table API + - To simplify the building of production SQL platforms with Flink, we are improving the SQL client and are + working on SQL gateway components that interface between client and cluster: [FLIP-163](https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements) -Flink views stream processing as a [unifying paradigm for data processing]({{ site.baseurl }}/flink-architecture.html) -(batch and real-time) and event-driven applications. The APIs are evolving to reflect that view: +**Support for Common Languages, Formats, Catalogs** - - The **Table API / SQL** is becoming the primary API for analytical use cases, in a unified way - across batch and streaming. To support analytical use cases in a more streamlined fashion, - the API is being extended with more convenient multi-row/column operations ([FLIP-29](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97552739)). + - Hive Query Compatibility: [FLIP-152](https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility) - - Like SQL, the Table API is *declarative*, operates on a *logical schema*, and applies *automatic optimization*. - Because of these properties, that API does not give direct access to time and state. +Flink has a broad SQL coverage for batch (full TPC-DS support) and a state-of-the-art set of supported +operations in streaming. There is continuous effort to add more functions and cover more SQL operations. - - The Table API is also the foundation for the Machine Learning (ML) efforts inititated in ([FLIP-39](https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs)), that will allow users to easily build, persist and serve ([FLINK-13167](https://issues.apache.org/jira/browse/FLINK-13167)) ML pipelines/workflows through a set of abstract core interfaces. +## Deep Batch / Streaming Unification for the DataStream API - - The **DataStream API** is the primary API for data-driven applications and data pipelines. - It uses *physical data types* (Java/Scala classes) and there is no automatic rewriting. - The applications have explicit control over *time* and *state* (state, triggers, proc fun.). - In the long run, the DataStream API will fully subsume the DataSet API through *bounded streams*. - -# Batch and Streaming Unification +The *DataStream API* is Flink's *physical* API, for use cases where users need very explicit control over data +types, streams, state, and time. This API is evolving to support efficient batch execution on bounded data. -Flink's approach is to cover batch and streaming by the same APIs on a streaming runtime. -[This blog post]({{ site.baseurl }}/news/2019/02/13/unified-batch-streaming-blink.html) -gives an introduction to the unification effort. +DataStream API executes the same dataflow shape in batch as in streaming, keeping the same operators. +That way users keep the same level of control over the dataflow, and our goal is to mix and switch between +batch/streaming execution in the future to make it a seamless experience. -The biggest user-facing parts currently ongoing are: +**Unified Sources and Sinks** - - Table API restructuring ([FLIP-32](https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions)) - that decouples the Table API from batch/streaming specific environments and dependencies. Some key parts of the FLIP are completed, such as the modular decoupling of expression parsing and the removal of Scala dependencies, and the next step is to unify the function stack ([FLINK-12710](https://issues.apache.org/jira/browse/FLINK-12710)). + - The first APIs and implementations of sources were specific to either streaming programs in the DataStream API + ([SourceFunction](https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/SourceFunction.java)), + or to batch programs in the DataSet API ([InputFormat](https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/io/InputFormat.java)). - - The new source interfaces generalize across batch and streaming, making every connector usable as a batch and streaming data source ([FLIP-27](https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface)). + In this effort, we are creating sources that work across batch and streaming execution. The aim is to give + users a consistent experience across both modes, and to allow them to easily switch between streaming and batch + execution for their unbounded and bounded streaming applications. + The interface for this New Source API is done and available, and we are working on migrating more source connectors + to this new model, see [FLIP-27](https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface). - - The introduction of *upsert-* or *changelog-* sources will support more powerful streaming inputs to the Table API ([FLINK-8545](https://issues.apache.org/jira/browse/FLINK-8545)). + - Similar to the sources, the sinks original sink APIs are also specific to streaming + ([SinkFunction](https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/SinkFunction.java)) + and batch ([OutputFormat](https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/io/OutputFormat.java)) + APIs and execution. -On the runtime level, the streaming operators were extended in Flink 1.9 to also support the data consumption patterns required for some batch operations — which is groundwork for upcoming features like efficient [side inputs](https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API). + We have introduced a new API for sinks that consistently handles result writing and committing (*Transactions*) + across batch and streaming. The first iteration of the API exists, and we are porting sinks and refining the + API in the process. See [FLIP-143](https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API). -Once these unification efforts are completed, we can move on to unifying the DataStream API. +**DataStream Batch Execution** -# Fast Batch (Bounded Streams) + - Flink is adding a *batch execution mode* for bounded DataStream programs. This gives users faster and simpler + execution and recovery of their bounded streaming applications; users do not need to worry about watermarks and + state sizes in this execution mode: [FLIP-140](https://cwiki.apache.org/confluence/display/FLINK/FLIP-140%3A+Introduce+batch-style+execution+for+bounded+keyed+streams) -The community's goal is to make Flink's performance on bounded streams (batch use cases) competitive with that -of dedicated batch processors. While Flink has been shown to handle some batch processing use cases faster than -widely-used batch processors, there are some ongoing efforts to make sure this the case for broader use cases: + The core batch execution mode is implemented with [great results](https://flink.apache.org/news/2020/12/10/release-1.12.0.html#batch-execution-mode-in-the-datastream-api); + there are ongoing improvements around aspects like broadcast state and processing-time-timers. + This mode requires the new unified sources and sinks that are mentioned above, so it is limited + to the connectors that have been ported to those new APIs. - - Faster and more complete SQL/Table API: The community is merging the Blink query processor which improves on - the current query processor by adding a much richer set of runtime operators, optimizer rules, and code generation. - The Blink-based query processor has full TPC-H support (with TPC-DS planned for the next release) and up to 10x performance improvement over the pre-1.9 Flink query processor ([FLINK-11439](https://issues.apache.org/jira/browse/FLINK-11439)). +**Mixing bounded/unbounded streams, and batch/streaming execution** - - An application on bounded data can schedule operations after another, depending on how the operators - consume data (e.g., first build hash table, then probe hash table). - We are separating the scheduling strategy from the ExecutionGraph to support different strategies - on bounded data ([FLINK-10429](https://issues.apache.org/jira/browse/FLINK-10429)). + - Support checkpointing when some tasks finished & Bounded stream programs shut down with a final + checkpoint: [FLIP-147](https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished) - - Caching of intermediate results on bounded data, to support use cases like interactive data exploration. - The caching generally helps with applications where the client submits a series of jobs that build on - top of one another and reuse each others' results ([FLINK-11199](https://issues.apache.org/jira/browse/FLINK-11199)). + - There are initial discussions and designs about jobs with mixed batch/streaming execution, so stay tuned for more + news in that area. -Various of these enhancements can be integrated from the contributed code in the [Blink fork](https://github.com/apache/flink/tree/blink). To exploit these optimizations for bounded streams also in the DataStream API, we first need to break parts of the API and explicitly model bounded streams. +## Subsuming DataSet with DataStream and Table API -# Stream Processing Use Cases - -The *new source interface* effort ([FLIP-27](https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface)) -aims to give simpler out-of-the box support for event time and watermark generation for sources. -Sources will have the option to align their consumption speed in event time, to reduce the -size of in-flight state when re-processing large data volumes in streaming -([FLINK-10887](https://issues.apache.org/jira/browse/FLINK-10886)). +We want to eventually drop the legacy Batch-only DataSet API, have batch-and stream processing unified +throughout the entire system. -To overcome the current pitfalls of checkpoint performance under backpressure scenarios, the community is introducing the concept of [unaligned checkpoints](https://lists.apache.org/thread.html/fd5b6cceb4bffb635e26e7ec0787a8db454ddd64aadb40a0d08a90a8@%3Cdev.flink.apache.org%3E). This will allow checkpoint barriers to overtake the output/input buffer queue to speed up alignment and snapshot the inflight data as part of checkpoint state. +Overall Discussion: [FLIP-131](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741) -We also plan to add first class support for -[Protocol Buffers](https://developers.google.com/protocol-buffers/) to make evolution of streaming state simpler, similar to the way -Flink deeply supports Avro state evolution ([FLINK-11333](https://issues.apache.org/jira/browse/FLINK-11333)). +The _DataStream API_ supports batch-execution to efficiently execute streaming programs on historic data +(see above). Takes over that set of use cases. -# Deployment, Scaling and Security +The _Table API_ should become the default API for batch-only applications. -To provide downstream projects with a consistent way to programatically control Flink deployment submissions, the Client API is being [refactored](https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E). The goal is to unify the implementation of cluster deployment and job submission in Flink and allow more flexible job and cluster management — independent of cluster setup or deployment mode. [FLIP-52](https://cwiki.apache [...] + - Add more operations to Table API, so support common data manipulation tasks more + easily: [FLIP-155](https://cwiki.apache.org/confluence/display/FLINK/FLIP-155%3A+Introduce+a+few+convenient+operations+in+Table+API) + - Make Source and Sink definitions easier in the Table API. +Improve the _interplay between the Table API and the DataStream API_ to allow switching from Table API to +DataStream API when more control over the data types and operations is necessary. -The community is working on extending the interoperability with authentication and authorization services. -Under discussion are general extensions to the [security module abstraction](http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html) -as well as specific [enhancements to the Kerberos support](http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html). + - Interoperability between DataStream and Table APIs: [FLIP-136](https://cwiki.apache.org/confluence/display/FLINK/FLIP-136%3A++Improve+interoperability+between+DataStream+and+Table+API) -# Resource Management and Configuration +<hr /> + +# Applications vs. Clusters; "Flink as a Library" + +The goal of these efforts is to make it feel natural to deploy (long running streaming) Flink applications. +Instead of starting a cluster and submitting a job to that cluster, these efforts support deploying a streaming +job as a self contained application. + +For example as a simple Kubernetes deployment; deployed and scaled like a regular application without extra workflows. + +Deploy Flink jobs as self-contained Applications works for all deployment targets since Flink 1.11.0 +([FLIP-85](https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode)). + + - Reactive Scaling lets Flink applications change their parallelism in response to growing and shrinking + worker pools, and makes Flink compatibel with standard auto-scalers: + [FLIP-159](https://cwiki.apache.org/confluence/display/FLINK/FLIP-159%3A+Reactive+Mode) + + - Kubernetes-based HA-services let Flink applications run on Kubernetes without requiring a ZooKeeper dependency: + [FLIP-144](https://cwiki.apache.org/confluence/display/FLINK/FLIP-144%3A+Native+Kubernetes+HA+for+Flink) + +<hr /> + +# Performance -There is a big effort to design a new way for Flink to interact with dynamic resource -pools and automatically adjust to resource availability and load. -Part of this is becoming a *reactive* way of adjusting to changing resources (like -containers/pods being started or removed) ([FLINK-10407](https://issues.apache.org/jira/browse/FLINK-10407)), -while other parts are resulting in *active* scaling policies where Flink decides to add -or remove TaskManagers, based on internal metrics. +Continuous work to keep improving performance and recovery speed. - - The current TaskExecutor memory configuration in Flink has some shortcomings that make it hard to reason about or optimize resource utilization, such as: (1) different configuration models for memory footprint for Streaming and Batch; (2) complex and user-dependent configuration of off-heap state backends (typically RocksDB) in Streaming execution; (3) and sub-optimal memory utilization in Batch execution. [FLIP-49](https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified [...] +## Faster Checkpoints and Recovery - - In a similar way, we are introducing changes to Flink's resource management module with [FLIP-53](https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management) to enable fine-grained control over Operator resource utilization according to known (or unknown) resource profiles. Since the requirements of this FLIP conflict with the existing static slot allocation model, this model first needs to be refactored to provide dynamic slot allocation ( [...] +The community is continuously working on improving checkpointing and recovery speed. +Checkpoints and recovery are stable and have been a reliable workhorse for years. We are still +trying to make it faster, more predictable, and to remove some confusions and inflexibility in some areas. - - To support the active resource management also in Kubernetes, we are working on a Kubernetes Resource Manager -([FLINK-9953](https://issues.apache.org/jira/browse/FLINK-9953)). + - Unaligned Checkpoints, to make checkpoints progress faster when applications cause backpressure: + [FLIP-76](https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints), available + since Flink 1.12.2. + - Log-based checkpoints, for very frequent incremental checkpointing: + [FLIP-158](https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints) -Spillable Heap State Backend ([FLIP-50](https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend)), a new state backend configuration, is being implemented to support spilling cold state data to disk before heap memory is exhausted and so reduce the chance of OOM errors in job execution. This is not meant as a replacement for RocksDB, but more of an enhancement to the existing Heap State Backend. +## Large Scale Batch Applications -# Ecosystem +The community is working on making large scale batch execution (parallelism in the order of 10,000s) +simpler (less configuration tuning required) and more performant. -The community is working on extending the support for catalogs, schema registries, and metadata stores, including support in the APIs and the SQL client ([FLINK-11275](https://issues.apache.org/jira/browse/FLINK-11275)). -We have added DDL (Data Definition Language) support in Flink 1.9 to make it easy to add tables to catalogs ([FLINK-10232](https://issues.apache.org/jira/browse/FLINK-10232)), and will extend the support to streaming use cases in the next release. + - Introduce a more scalable batch shuffle. First parts of this have been merged, and ongoing efforts are + to make the memory footprint (JVM direct memory) more predictable, see + [FLIP-148](https://cwiki.apache.org/confluence/display/FLINK/FLIP-148%3A+Introduce+Sort-Merge+Based+Blocking+Shuffle+to+Flink) -There is also an ongoing effort to fully integrate Flink with the Hive ecosystem. The latest release made headway in bringing Hive data and metadata interoperability to Flink, along with initial support for Hive UDFs. Moving forward, the community will stabilize and expand on the existing implementation to support Hive DDL syntax and types, as well as other desirable features and capabilities described in [FLINK-10556](https://issues.apache.org/jira/browse/FLINK-10556). + - [FLINK-20740](https://issues.apache.org/jira/browse/FLINK-20740) + - [FLINK-19938](https://issues.apache.org/jira/browse/FLINK-19938) -# Non-JVM Languages (Python) + - Make scheduler faster for higher parallelism: [FLINK-21110](https://issues.apache.org/jira/browse/FLINK-21110) -The work initiated in Flink 1.9 to bring full Python support to the Table API ([FLIP-38](https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API)) will continue in the upcoming releases, also in close collaboration with the Apache Beam community. The next steps include: +<hr /> + +# Python APIs + +Stateful transformation functions for the Python DataStream API: +[FLIP-153](https://cwiki.apache.org/confluence/display/FLINK/FLIP-153%3A+Support+state+access+in+Python+DataStream+API) + +<hr /> - - Adding support for Python UDFs (Scalar Functions (UDF), Tabular Functions (UDTF) and Aggregate Functions (UDAF)). The details of this implementation are defined in [FLIP-58](https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Function+for+Table+API) and leverage the [Apache Beam portability framework](https://docs.google.com/document/d/1B9NmaBSKCnMJQp-ibkxvZ_U233Su67c1eYgBhrqWP24/edit#heading=h.khjybycus70) as a basis for UDF execution. +# Documentation - - Integrating Pandas as the final effort — that is, making functions in Pandas directly usable in the Python Table API. +There are various dedicated efforts to simplify the maintenance and structure (more intuitive navigation/reading) +of the documentation. -# Connectors and Formats + - Docs Tech Stack: [FLIP-157](https://cwiki.apache.org/confluence/display/FLINK/FLIP-157+Migrate+Flink+Documentation+from+Jekyll+to+Hugo) + - General Docs Structure: [FLIP-42](https://cwiki.apache.org/confluence/display/FLINK/FLIP-42%3A+Rework+Flink+Documentation) + - SQL Docs: [FLIP-60](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127405685) -Support for additional connectors and formats is a continuous process. +<hr /> + +# Miscellaneous Operational Tools + + - Allow switching state backends with savepoints: [FLINK-20976](https://issues.apache.org/jira/browse/FLINK-20976) + - Support for Savepoints with more properties, like incremental savepoints, etc.: + [FLIP-47](https://cwiki.apache.org/confluence/display/FLINK/FLIP-47%3A+Checkpoints+vs.+Savepoints) + +<hr /> -# Miscellaneous +# Stateful Functions - - The Flink code base has been updated to support Java 9 ([FLINK-8033](https://issues.apache.org/jira/browse/FLINK-8033)) and Java 11 support is underway ([FLINK-10725](https://issues.apache.org/jira/browse/FLINK-10725)). - - - To reduce compatibility issues with different Scala versions, we are working using Scala - only in the Scala APIs, but not in the runtime. That removes any Scala dependency for all - Java-only users, and makes it easier for Flink to support different Scala versions ([FLINK-11063](https://issues.apache.org/jira/browse/FLINK-11063)). +The Stateful Functions subproject has its own roadmap published under [statefun.io](https://statefun.io/).
