This is an automated email from the ASF dual-hosted git repository. bli pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/flink-web.git
commit fbc6bd5fdc9feb5478a4af304841081e6bb81a17 Author: bowen.li <[email protected]> AuthorDate: Fri Mar 27 12:41:40 2020 -0700 regenerate site --- content/blog/feed.xml | 225 +++++++------ content/blog/index.html | 38 ++- content/blog/page10/index.html | 45 ++- content/blog/page11/index.html | 25 ++ content/blog/page2/index.html | 36 +- content/blog/page3/index.html | 38 ++- content/blog/page4/index.html | 40 ++- content/blog/page5/index.html | 38 ++- content/blog/page6/index.html | 36 +- content/blog/page7/index.html | 37 ++- content/blog/page8/index.html | 38 ++- content/blog/page9/index.html | 44 +-- .../2020/03/26/flink-for-data-warehouse.html | 369 +++++++++++++++++++++ content/index.html | 8 +- content/zh/index.html | 8 +- 15 files changed, 775 insertions(+), 250 deletions(-) diff --git a/content/blog/feed.xml b/content/blog/feed.xml index 803c8fb..c3bd365 100644 --- a/content/blog/feed.xml +++ b/content/blog/feed.xml @@ -7,6 +7,132 @@ <atom:link href="https://flink.apache.org/blog/feed.xml" rel="self" type="application/rss+xml" /> <item> +<title>Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</title> +<description><p>In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse.</p> + +<div class="page-toc"> +<ul id="markdown-toc"> + <li><a href="#introduction" id="markdown-toc-introduction">Introduction</a></li> + <li><a href="#flink-and-its-integration-with-hive-comes-into-the-scene" id="markdown-toc-flink-and-its-integration-with-hive-comes-into-the-scene">Flink and Its Integration With Hive Comes into the Scene</a> <ul> + <li><a href="#unified-metadata-management" id="markdown-toc-unified-metadata-management">Unified Metadata Management</a></li> + <li><a href="#stream-processing" id="markdown-toc-stream-processing">Stream Processing</a></li> + <li><a href="#compatible-with-more-hive-versions" id="markdown-toc-compatible-with-more-hive-versions">Compatible with More Hive Versions</a></li> + <li><a href="#reuse-hive-user-defined-functions-udfs" id="markdown-toc-reuse-hive-user-defined-functions-udfs">Reuse Hive User Defined Functions (UDFs)</a></li> + <li><a href="#enhanced-read-and-write-on-hive-data" id="markdown-toc-enhanced-read-and-write-on-hive-data">Enhanced Read and Write on Hive Data</a></li> + <li><a href="#formats" id="markdown-toc-formats">Formats</a></li> + <li><a href="#more-data-types" id="markdown-toc-more-data-types">More Data Types</a></li> + <li><a href="#roadmap" id="markdown-toc-roadmap">Roadmap</a></li> + </ul> + </li> + <li><a href="#summary" id="markdown-toc-summary">Summary</a></li> +</ul> + +</div> + +<h2 id="introduction">Introduction</h2> + +<p>What are some of the latest requirements for your data warehouse and data infrastructure in 2020?</p> + +<p>We’ve came up with some for you.</p> + +<p>Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics. People become less and less tolerant of delays between when data is generated and when it arrives at their hands, ready to use. Hours or even days of delay is not acceptable anymore. Users are expecting minutes, or even seconds, of end-to-end latency for data in their warehouse, to get quic [...] + +<p>Secondly, the infrastructure should be able to handle both offline batch data for offline analytics and exploration, and online streaming data for more timely analytics. Both are indispensable as they both have very valid use cases. Apart from the real time processing mentioned above, batch processing would still exist as it’s good for ad hoc queries and explorations, and full-size calculations. Your modern infrastructure should not force users to choose between one or the other [...] + +<p>Thirdly, the data players, including data engineers, data scientists, analysts, and operations, urge a more unified infrastructure than ever before for easier ramp-up and higher working efficiency. The big data landscape has been fragmented for years - companies may have one set of infrastructure for real time processing, one set for batch, one set for OLAP, etc. That, oftentimes, comes as a result of the legacy of lambda architecture, which was popular in the era when stream pr [...] + +<p>If any of these resonate with you, you just found the right post to read: we have never been this close to the vision by strengthening Flink’s integration with Hive to a production grade.</p> + +<h2 id="flink-and-its-integration-with-hive-comes-into-the-scene">Flink and Its Integration With Hive Comes into the Scene</h2> + +<p>Apache Flink has been a proven scalable system to handle extremely high workload of streaming data in super low latency in many giant tech companies.</p> + +<p>Despite its huge success in the real time processing domain, at its deep root, Flink has been faithfully following its inborn philosophy of being <a href="https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html">a unified data processing engine for both batch and streaming</a>, and taking a streaming-first approach in its architecture to do batch processing. By making batch a special case for streaming, Flink really leverages its cutting [...] + +<p>On the other hand, Apache Hive has established itself as a focal point of the data warehousing ecosystem. It serves as not only a SQL engine for big data analytics and ETL, but also a data management platform, where data is discovered and defined. As business evolves, it puts new requirements on data warehouse.</p> + +<p>Thus we started integrating Flink and Hive as a beta version in Flink 1.9. Over the past few months, we have been listening to users’ requests and feedback, extensively enhancing our product, and running rigorous benchmarks (which will be published soon separately). I’m glad to announce that the integration between Flink and Hive is at production grade in <a href="https://flink.apache.org/news/2020/02/11/release-1.10.0.html">Flink 1.10</a> and we can’t wait [...] + +<h3 id="unified-metadata-management">Unified Metadata Management</h3> + +<p>Hive Metastore has evolved into the de facto metadata hub over the years in the Hadoop, or even the cloud, ecosystem. Many companies have a single Hive Metastore service instance in production to manage all of their schemas, either Hive or non-Hive metadata, as the single source of truth.</p> + +<p>In 1.9 we introduced Flink’s <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_catalog.html">HiveCatalog</a>, connecting Flink to users’ rich metadata pool. The meaning of <code>HiveCatalog</code> is two-fold here. First, it allows Apache Flink users to utilize Hive Metastore to store and manage Flink’s metadata, including tables, UDFs, and statistics of data. Second, it enables Flink to access Hive’s exis [...] + +<p>In Flink 1.10, users can store Flink’s own tables, views, UDFs, statistics in Hive Metastore on all of the compatible Hive versions mentioned above. <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_catalog.html#example">Here’s an end-to-end example</a> of how to store a Flink’s Kafka source table in Hive Metastore and later query the table in Flink SQL.</p> + +<h3 id="stream-processing">Stream Processing</h3> + +<p>The Hive integration feature in Flink 1.10 empowers users to re-imagine what they can accomplish with their Hive data and unlock stream processing use cases:</p> + +<ul> + <li>join real-time streaming data in Flink with offline Hive data for more complex data processing</li> + <li>backfill Hive data with Flink directly in a unified fashion</li> + <li>leverage Flink to move real-time data into Hive more quickly, greatly shortening the end-to-end latency between when data is generated and when it arrives at your data warehouse for analytics, from hours — or even days — to minutes</li> +</ul> + +<h3 id="compatible-with-more-hive-versions">Compatible with More Hive Versions</h3> + +<p>In Flink 1.10, we brought full coverage to most Hive versions including 1.0, 1.1, 1.2, 2.0, 2.1, 2.2, 2.3, and 3.1. Take a look <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/#supported-hive-versions">here</a>.</p> + +<h3 id="reuse-hive-user-defined-functions-udfs">Reuse Hive User Defined Functions (UDFs)</h3> + +<p>Users can <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_functions.html#hive-user-defined-functions">reuse all kinds of Hive UDFs in Flink</a> since Flink 1.9.</p> + +<p>This is a great win for Flink users with past history with the Hive ecosystem, as they may have developed custom business logic in their Hive UDFs. Being able to run these functions without any rewrite saves users a lot of time and brings them a much smoother experience when they migrate to Flink.</p> + +<p>To take it a step further, Flink 1.10 introduces <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_functions.html#use-hive-built-in-functions-via-hivemodule">compatibility of Hive built-in functions via HiveModule</a>. Over the years, the Hive community has developed a few hundreds of built-in functions that are super handy for users. For those built-in functions that don’t exist in Flink yet, users are now able to le [...] + +<h3 id="enhanced-read-and-write-on-hive-data">Enhanced Read and Write on Hive Data</h3> + +<p>Flink 1.10 extends its read and write capabilities on Hive data to all the common use cases with better performance.</p> + +<p>On the reading side, Flink now can read Hive regular tables, partitioned tables, and views. Lots of optimization techniques are developed around reading, including partition pruning and projection pushdown to transport less data from file storage, limit pushdown for faster experiment and exploration, and vectorized reader for ORC files.</p> + +<p>On the writing side, Flink 1.10 introduces “INSERT INTO” and “INSERT OVERWRITE” to its syntax, and can write to not only Hive’s regular tables, but also partitioned tables with either static or dynamic partitions.</p> + +<h3 id="formats">Formats</h3> + +<p>Your engine should be able to handle all common types of file formats to give you the freedom of choosing one over another in order to fit your business needs. It’s no exception for Flink. We have tested the following table storage formats: text, csv, SequenceFile, ORC, and Parquet.</p> + +<h3 id="more-data-types">More Data Types</h3> + +<p>In Flink 1.10, we added support for a few more frequently-used Hive data types that were not covered by Flink 1.9. Flink users now should have a full, smooth experience to query and manipulate Hive data from Flink.</p> + +<h3 id="roadmap">Roadmap</h3> + +<p>Integration between any two systems is a never-ending story.</p> + +<p>We are constantly improving Flink itself and the Flink-Hive integration also gets improved by collecting user feedback and working with folks in this vibrant community.</p> + +<p>After careful consideration and prioritization of the feedback we received, we have prioritize many of the below requests for the next Flink release of 1.11.</p> + +<ul> + <li>Hive streaming sink so that Flink can stream data into Hive tables, bringing a real streaming experience to Hive</li> + <li>Native Parquet reader for better performance</li> + <li>Additional interoperability - support creating Hive tables, views, functions in Flink</li> + <li>Better out-of-box experience with built-in dependencies, including documentations</li> + <li>JDBC driver so that users can reuse their existing toolings to run SQL jobs on Flink</li> + <li>Hive syntax and semantic compatible mode</li> +</ul> + +<p>If you have more feature requests or discover bugs, please reach out to the community through mailing list and JIRAs.</p> + +<h2 id="summary">Summary</h2> + +<p>Data warehousing is shifting to a more real-time fashion, and Apache Flink can make a difference for your organization in this space.</p> + +<p>Flink 1.10 brings production-ready Hive integration and empowers users to achieve more in both metadata management and unified/batch data processing.</p> + +<p>We encourage all our users to get their hands on Flink 1.10. You are very welcome to join the community in development, discussions, and all other kinds of collaborations in this topic.</p> + +</description> +<pubDate>Thu, 26 Mar 2020 03:30:00 +0100</pubDate> +<link>https://flink.apache.org/features/2020/03/26/flink-for-data-warehouse.html</link> +<guid isPermaLink="true">/features/2020/03/26/flink-for-data-warehouse.html</guid> +</item> + +<item> <title>Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</title> <description><p>In the <a href="https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html">first article</a> of the series, we gave a high-level description of the objectives and required functionality of a Fraud Detection engine. We also described how to make data partitioning in Apache Flink customizable based on modifiable rules instead of using a hardcoded <code>KeysExtractor</code> implementation.</p> @@ -16604,104 +16730,5 @@ Flink serialization system improved a lot over time and by now surpasses the cap <guid isPermaLink="true">/news/2014/11/04/release-0.7.0.html</guid> </item> -<item> -<title>Upcoming Events</title> -<description><p>We are happy to announce several upcoming Flink events both in Europe and the US. Starting with a <strong>Flink hackathon in Stockholm</strong> (Oct 8-9) and a talk about Flink at the <strong>Stockholm Hadoop User Group</strong> (Oct 8). This is followed by the very first <strong>Flink Meetup in Berlin</strong> (Oct 15). In the US, there will be two Flink Meetup talks: the first one at the <strong>Pasadena Big Data User Grou [...] - -<p>We are looking forward to seeing you at any of these events. The following is an overview of each event and links to the respective Meetup pages.</p> - -<h3 id="flink-hackathon-stockholm-oct-8-9">Flink Hackathon, Stockholm (Oct 8-9)</h3> - -<p>The hackathon will take place at KTH/SICS from Oct 8th-9th. You can sign up here: https://docs.google.com/spreadsheet/viewform?formkey=dDZnMlRtZHJ3Z0hVTlFZVjU2MWtoX0E6MA.</p> - -<p>Here is a rough agenda and a list of topics to work upon or look into. Suggestions and more topics are welcome.</p> - -<h4 id="wednesday-8th">Wednesday (8th)</h4> - -<p>9:00 - 10:00 Introduction to Apache Flink, System overview, and Dev -environment (by Stephan)</p> - -<p>10:15 - 11:00 Introduction to the topics (Streaming API and system by Gyula -&amp; Marton), (Graphs by Vasia / Martin / Stephan)</p> - -<p>11:00 - 12:30 Happy hacking (part 1)</p> - -<p>12:30 - Lunch (Food will be provided by KTH / SICS. A big thank you to them -and also to Paris, for organizing that)</p> - -<p>13:xx - Happy hacking (part 2)</p> - -<h4 id="thursday-9th">Thursday (9th)</h4> - -<p>Happy hacking (continued)</p> - -<h4 id="suggestions-for-topics">Suggestions for topics</h4> - -<h5 id="streaming">Streaming</h5> - -<ul> - <li> - <p>Sample streaming applications (e.g. continuous heavy hitters and topics -on the twitter stream)</p> - </li> - <li> - <p>Implement a simple SQL to Streaming program parser. Possibly using -Apache Calcite (http://optiq.incubator.apache.org/)</p> - </li> - <li> - <p>Implement different windowing methods (count-based, time-based, …)</p> - </li> - <li> - <p>Implement different windowed operations (windowed-stream-join, -windowed-stream-co-group)</p> - </li> - <li> - <p>Streaming state, and interaction with other programs (that access state -of a stream program)</p> - </li> -</ul> - -<h5 id="graph-analysis">Graph Analysis</h5> - -<ul> - <li> - <p>Prototype a Graph DSL (simple graph building, filters, graph -properties, some algorithms)</p> - </li> - <li> - <p>Prototype abstractions different Graph processing paradigms -(vertex-centric, partition-centric).</p> - </li> - <li> - <p>Generalize the delta iterations, allow flexible state access.</p> - </li> -</ul> - -<h3 id="meetup-hadoop-user-group-talk-stockholm-oct-8">Meetup: Hadoop User Group Talk, Stockholm (Oct 8)</h3> - -<p>Hosted by Spotify, opens at 6 PM.</p> - -<p>http://www.meetup.com/stockholm-hug/events/207323222/</p> - -<h3 id="st-flink-meetup-berlin-oct-15">1st Flink Meetup, Berlin (Oct 15)</h3> - -<p>We are happy to announce the first Flink meetup in Berlin. You are very welcome to to sign up and attend. The event will be held in Betahaus Cafe.</p> - -<p>http://www.meetup.com/Apache-Flink-Meetup/events/208227422/</p> - -<h3 id="meetup-pasadena-big-data-user-group-oct-29">Meetup: Pasadena Big Data User Group (Oct 29)</h3> - -<p>http://www.meetup.com/Pasadena-Big-Data-Users-Group/</p> - -<h3 id="meetup-silicon-valley-hands-on-programming-events-nov-4">Meetup: Silicon Valley Hands On Programming Events (Nov 4)</h3> - -<p>http://www.meetup.com/HandsOnProgrammingEvents/events/210504392/</p> - -</description> -<pubDate>Fri, 03 Oct 2014 12:00:00 +0200</pubDate> -<link>https://flink.apache.org/news/2014/10/03/upcoming_events.html</link> -<guid isPermaLink="true">/news/2014/10/03/upcoming_events.html</guid> -</item> - </channel> </rss> diff --git a/content/blog/index.html b/content/blog/index.html index 76feea3..7a4b8b0 100644 --- a/content/blog/index.html +++ b/content/blog/index.html @@ -185,6 +185,21 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></h2> + + <p>26 Mar 2020 + Bowen Li (<a href="https://twitter.com/Bowen__Li">@Bowen__Li</a>)</p> + + <p><p>In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse.</p> + +</p> + + <p><a href="/features/2020/03/26/flink-for-data-warehouse.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></h2> <p>24 Mar 2020 @@ -307,19 +322,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2019/12/09/flink-kubernetes-kudo.html">Running Apache Flink on Kubernetes with KUDO</a></h2> - - <p>09 Dec 2019 - Gerred Dillon </p> - - <p>A common use case for Apache Flink is streaming data analytics together with Apache Kafka, which provides a pub/sub model and durability for data streams. In this post, we demonstrate how to orchestrate Flink and Kafka with KUDO.</p> - - <p><a href="/news/2019/12/09/flink-kubernetes-kudo.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -352,6 +354,16 @@ <ul id="markdown-toc"> + <li><a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></li> + + + + + + + + + <li><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></li> diff --git a/content/blog/page10/index.html b/content/blog/page10/index.html index 906757c..7bccfc7 100644 --- a/content/blog/page10/index.html +++ b/content/blog/page10/index.html @@ -185,6 +185,26 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2015/04/13/release-0.9.0-milestone1.html">Announcing Flink 0.9.0-milestone1 preview release</a></h2> + + <p>13 Apr 2015 + </p> + + <p><p>The Apache Flink community is pleased to announce the availability of +the 0.9.0-milestone-1 release. The release is a preview of the +upcoming 0.9.0 release. It contains many new features which will be +available in the upcoming 0.9 release. Interested users are encouraged +to try it out and give feedback. As the version number indicates, this +release is a preview release that contains known issues.</p> + +</p> + + <p><a href="/news/2015/04/13/release-0.9.0-milestone1.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2015/04/07/march-in-flink.html">March 2015 in the Flink community</a></h2> <p>07 Apr 2015 @@ -324,21 +344,6 @@ and offers a new API including definition of flexible windows.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2014/10/03/upcoming_events.html">Upcoming Events</a></h2> - - <p>03 Oct 2014 - </p> - - <p><p>We are happy to announce several upcoming Flink events both in Europe and the US. Starting with a <strong>Flink hackathon in Stockholm</strong> (Oct 8-9) and a talk about Flink at the <strong>Stockholm Hadoop User Group</strong> (Oct 8). This is followed by the very first <strong>Flink Meetup in Berlin</strong> (Oct 15). In the US, there will be two Flink Meetup talks: the first one at the <strong>Pasadena Big Data User Group</strong> (Oct 29) and the second one at <strong>Si [...] - -</p> - - <p><a href="/news/2014/10/03/upcoming_events.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -371,6 +376,16 @@ and offers a new API including definition of flexible windows.</p> <ul id="markdown-toc"> + <li><a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></li> + + + + + + + + + <li><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></li> diff --git a/content/blog/page11/index.html b/content/blog/page11/index.html index c022062..f757c49 100644 --- a/content/blog/page11/index.html +++ b/content/blog/page11/index.html @@ -185,6 +185,21 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2014/10/03/upcoming_events.html">Upcoming Events</a></h2> + + <p>03 Oct 2014 + </p> + + <p><p>We are happy to announce several upcoming Flink events both in Europe and the US. Starting with a <strong>Flink hackathon in Stockholm</strong> (Oct 8-9) and a talk about Flink at the <strong>Stockholm Hadoop User Group</strong> (Oct 8). This is followed by the very first <strong>Flink Meetup in Berlin</strong> (Oct 15). In the US, there will be two Flink Meetup talks: the first one at the <strong>Pasadena Big Data User Group</strong> (Oct 29) and the second one at <strong>Si [...] + +</p> + + <p><a href="/news/2014/10/03/upcoming_events.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2014/09/26/release-0.6.1.html">Apache Flink 0.6.1 available</a></h2> <p>26 Sep 2014 @@ -249,6 +264,16 @@ academic and open source project that Flink originates from.</p> <ul id="markdown-toc"> + <li><a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></li> + + + + + + + + + <li><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></li> diff --git a/content/blog/page2/index.html b/content/blog/page2/index.html index 3dd4a60..b2b9023 100644 --- a/content/blog/page2/index.html +++ b/content/blog/page2/index.html @@ -185,6 +185,19 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2019/12/09/flink-kubernetes-kudo.html">Running Apache Flink on Kubernetes with KUDO</a></h2> + + <p>09 Dec 2019 + Gerred Dillon </p> + + <p>A common use case for Apache Flink is streaming data analytics together with Apache Kafka, which provides a pub/sub model and durability for data streams. In this post, we demonstrate how to orchestrate Flink and Kafka with KUDO.</p> + + <p><a href="/news/2019/12/09/flink-kubernetes-kudo.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2019/11/25/query-pulsar-streams-using-apache-flink.html">How to query Pulsar Streams using Apache Flink</a></h2> <p>25 Nov 2019 @@ -310,19 +323,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/2019/06/05/flink-network-stack.html">A Deep-Dive into Flink's Network Stack</a></h2> - - <p>05 Jun 2019 - Nico Kruber </p> - - <p>Flink’s network stack is one of the core components that make up Apache Flink's runtime module sitting at the core of every Flink job. In this post, which is the first in a series of posts about the network stack, we look at the abstractions exposed to the stream operators and detail their physical implementation and various optimisations in Apache Flink.</p> - - <p><a href="/2019/06/05/flink-network-stack.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -355,6 +355,16 @@ <ul id="markdown-toc"> + <li><a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></li> + + + + + + + + + <li><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></li> diff --git a/content/blog/page3/index.html b/content/blog/page3/index.html index 6cbe424..fe76cc1 100644 --- a/content/blog/page3/index.html +++ b/content/blog/page3/index.html @@ -185,6 +185,19 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/2019/06/05/flink-network-stack.html">A Deep-Dive into Flink's Network Stack</a></h2> + + <p>05 Jun 2019 + Nico Kruber </p> + + <p>Flink’s network stack is one of the core components that make up Apache Flink's runtime module sitting at the core of every Flink job. In this post, which is the first in a series of posts about the network stack, we look at the abstractions exposed to the stream operators and detail their physical implementation and various optimisations in Apache Flink.</p> + + <p><a href="/2019/06/05/flink-network-stack.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/2019/05/19/state-ttl.html">State TTL in Flink 1.8.0: How to Automatically Cleanup Application State in Apache Flink</a></h2> <p>19 May 2019 @@ -311,21 +324,6 @@ for more details.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2019/02/15/release-1.7.2.html">Apache Flink 1.7.2 Released</a></h2> - - <p>15 Feb 2019 - </p> - - <p><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.7 series.</p> - -</p> - - <p><a href="/news/2019/02/15/release-1.7.2.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -358,6 +356,16 @@ for more details.</p> <ul id="markdown-toc"> + <li><a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></li> + + + + + + + + + <li><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></li> diff --git a/content/blog/page4/index.html b/content/blog/page4/index.html index 22faf1d..a20e2a9 100644 --- a/content/blog/page4/index.html +++ b/content/blog/page4/index.html @@ -185,6 +185,21 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2019/02/15/release-1.7.2.html">Apache Flink 1.7.2 Released</a></h2> + + <p>15 Feb 2019 + </p> + + <p><p>The Apache Flink community released the second bugfix version of the Apache Flink 1.7 series.</p> + +</p> + + <p><a href="/news/2019/02/15/release-1.7.2.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2019/02/13/unified-batch-streaming-blink.html">Batch as a Special Case of Streaming and Alibaba's contribution of Blink</a></h2> <p>13 Feb 2019 @@ -319,21 +334,6 @@ Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa <hr> - <article> - <h2 class="blog-title"><a href="/news/2018/08/21/release-1.5.3.html">Apache Flink 1.5.3 Released</a></h2> - - <p>21 Aug 2018 - </p> - - <p><p>The Apache Flink community released the third bugfix version of the Apache Flink 1.5 series.</p> - -</p> - - <p><a href="/news/2018/08/21/release-1.5.3.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -366,6 +366,16 @@ Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa <ul id="markdown-toc"> + <li><a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></li> + + + + + + + + + <li><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></li> diff --git a/content/blog/page5/index.html b/content/blog/page5/index.html index 6e969e6..7b6a542 100644 --- a/content/blog/page5/index.html +++ b/content/blog/page5/index.html @@ -185,6 +185,21 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2018/08/21/release-1.5.3.html">Apache Flink 1.5.3 Released</a></h2> + + <p>21 Aug 2018 + </p> + + <p><p>The Apache Flink community released the third bugfix version of the Apache Flink 1.5 series.</p> + +</p> + + <p><a href="/news/2018/08/21/release-1.5.3.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2018/08/09/release-1.6.0.html">Apache Flink 1.6.0 Release Announcement</a></h2> <p>09 Aug 2018 @@ -315,19 +330,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2017/12/21/2017-year-in-review.html">Apache Flink in 2017: Year in Review</a></h2> - - <p>21 Dec 2017 - Chris Ward (<a href="https://twitter.com/chrischinch">@chrischinch</a>) & Mike Winters (<a href="https://twitter.com/wints">@wints</a>)</p> - - <p>As 2017 comes to a close, let's take a moment to look back on the Flink community's great work during the past year.</p> - - <p><a href="/news/2017/12/21/2017-year-in-review.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -360,6 +362,16 @@ <ul id="markdown-toc"> + <li><a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></li> + + + + + + + + + <li><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></li> diff --git a/content/blog/page6/index.html b/content/blog/page6/index.html index d2fad84..fd2d3d1 100644 --- a/content/blog/page6/index.html +++ b/content/blog/page6/index.html @@ -185,6 +185,19 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2017/12/21/2017-year-in-review.html">Apache Flink in 2017: Year in Review</a></h2> + + <p>21 Dec 2017 + Chris Ward (<a href="https://twitter.com/chrischinch">@chrischinch</a>) & Mike Winters (<a href="https://twitter.com/wints">@wints</a>)</p> + + <p>As 2017 comes to a close, let's take a moment to look back on the Flink community's great work during the past year.</p> + + <p><a href="/news/2017/12/21/2017-year-in-review.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2017/12/12/release-1.4.0.html">Apache Flink 1.4.0 Release Announcement</a></h2> <p>12 Dec 2017 @@ -321,19 +334,6 @@ what’s coming in Flink 1.4.0 as well as a preview of what the Flink community <hr> - <article> - <h2 class="blog-title"><a href="/news/2017/03/29/table-sql-api-update.html">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a></h2> - - <p>29 Mar 2017 by Timo Walther (<a href="https://twitter.com/">@twalthr</a>) - </p> - - <p><p>Broadening the user base and unifying batch & streaming with relational APIs</p></p> - - <p><a href="/news/2017/03/29/table-sql-api-update.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -366,6 +366,16 @@ what’s coming in Flink 1.4.0 as well as a preview of what the Flink community <ul id="markdown-toc"> + <li><a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></li> + + + + + + + + + <li><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></li> diff --git a/content/blog/page7/index.html b/content/blog/page7/index.html index 2c50ecc..4141db2 100644 --- a/content/blog/page7/index.html +++ b/content/blog/page7/index.html @@ -185,6 +185,19 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2017/03/29/table-sql-api-update.html">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a></h2> + + <p>29 Mar 2017 by Timo Walther (<a href="https://twitter.com/">@twalthr</a>) + </p> + + <p><p>Broadening the user base and unifying batch & streaming with relational APIs</p></p> + + <p><a href="/news/2017/03/29/table-sql-api-update.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2017/03/23/release-1.1.5.html">Apache Flink 1.1.5 Released</a></h2> <p>23 Mar 2017 @@ -315,20 +328,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2016/05/24/stream-sql.html">Stream Processing for Everyone with SQL and Apache Flink</a></h2> - - <p>24 May 2016 by Fabian Hueske (<a href="https://twitter.com/">@fhueske</a>) - </p> - - <p><p>About six months ago, the Apache Flink community started an effort to add a SQL interface for stream data analysis. SQL is <i>the</i> standard language to access and process data. Everybody who occasionally analyzes data is familiar with SQL. Consequently, a SQL interface for stream data processing will make this technology accessible to a much wider audience. Moreover, SQL support for streaming data will also enable new use cases such as interactive and ad-hoc stream analysi [...] -<p>In this blog post, we report on the current status, architectural design, and future plans of the Apache Flink community to implement support for SQL as a language for analyzing data streams.</p></p> - - <p><a href="/news/2016/05/24/stream-sql.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -361,6 +360,16 @@ <ul id="markdown-toc"> + <li><a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></li> + + + + + + + + + <li><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></li> diff --git a/content/blog/page8/index.html b/content/blog/page8/index.html index 87415c5..55b994d 100644 --- a/content/blog/page8/index.html +++ b/content/blog/page8/index.html @@ -185,6 +185,20 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2016/05/24/stream-sql.html">Stream Processing for Everyone with SQL and Apache Flink</a></h2> + + <p>24 May 2016 by Fabian Hueske (<a href="https://twitter.com/">@fhueske</a>) + </p> + + <p><p>About six months ago, the Apache Flink community started an effort to add a SQL interface for stream data analysis. SQL is <i>the</i> standard language to access and process data. Everybody who occasionally analyzes data is familiar with SQL. Consequently, a SQL interface for stream data processing will make this technology accessible to a much wider audience. Moreover, SQL support for streaming data will also enable new use cases such as interactive and ad-hoc stream analysi [...] +<p>In this blog post, we report on the current status, architectural design, and future plans of the Apache Flink community to implement support for SQL as a language for analyzing data streams.</p></p> + + <p><a href="/news/2016/05/24/stream-sql.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2016/05/11/release-1.0.3.html">Flink 1.0.3 Released</a></h2> <p>11 May 2016 @@ -313,20 +327,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2015/12/04/Introducing-windows.html">Introducing Stream Windows in Apache Flink</a></h2> - - <p>04 Dec 2015 by Fabian Hueske (<a href="https://twitter.com/">@fhueske</a>) - </p> - - <p><p>The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache F [...] -<p>In this blog post, we discuss the concept of windows for stream processing, present Flink's built-in windows, and explain its support for custom windowing semantics.</p></p> - - <p><a href="/news/2015/12/04/Introducing-windows.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -359,6 +359,16 @@ <ul id="markdown-toc"> + <li><a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></li> + + + + + + + + + <li><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></li> diff --git a/content/blog/page9/index.html b/content/blog/page9/index.html index ba1abcd..211a63a 100644 --- a/content/blog/page9/index.html +++ b/content/blog/page9/index.html @@ -185,6 +185,20 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2015/12/04/Introducing-windows.html">Introducing Stream Windows in Apache Flink</a></h2> + + <p>04 Dec 2015 by Fabian Hueske (<a href="https://twitter.com/">@fhueske</a>) + </p> + + <p><p>The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache F [...] +<p>In this blog post, we discuss the concept of windows for stream processing, present Flink's built-in windows, and explain its support for custom windowing semantics.</p></p> + + <p><a href="/news/2015/12/04/Introducing-windows.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2015/11/27/release-0.10.1.html">Flink 0.10.1 released</a></h2> <p>27 Nov 2015 @@ -322,26 +336,6 @@ vertex-centric or gather-sum-apply to Flink dataflows.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2015/04/13/release-0.9.0-milestone1.html">Announcing Flink 0.9.0-milestone1 preview release</a></h2> - - <p>13 Apr 2015 - </p> - - <p><p>The Apache Flink community is pleased to announce the availability of -the 0.9.0-milestone-1 release. The release is a preview of the -upcoming 0.9.0 release. It contains many new features which will be -available in the upcoming 0.9 release. Interested users are encouraged -to try it out and give feedback. As the version number indicates, this -release is a preview release that contains known issues.</p> - -</p> - - <p><a href="/news/2015/04/13/release-0.9.0-milestone1.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -374,6 +368,16 @@ release is a preview release that contains known issues.</p> <ul id="markdown-toc"> + <li><a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></li> + + + + + + + + + <li><a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></li> diff --git a/content/features/2020/03/26/flink-for-data-warehouse.html b/content/features/2020/03/26/flink-for-data-warehouse.html new file mode 100644 index 0000000..e9a6b5e --- /dev/null +++ b/content/features/2020/03/26/flink-for-data-warehouse.html @@ -0,0 +1,369 @@ +<!DOCTYPE html> +<html lang="en"> + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> + <title>Apache Flink: Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</title> + <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"> + <link rel="icon" href="/favicon.ico" type="image/x-icon"> + + <!-- Bootstrap --> + <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css"> + <link rel="stylesheet" href="/css/flink.css"> + <link rel="stylesheet" href="/css/syntax.css"> + + <!-- Blog RSS feed --> + <link href="/blog/feed.xml" rel="alternate" type="application/rss+xml" title="Apache Flink Blog: RSS feed" /> + + <!-- jQuery (necessary for Bootstrap's JavaScript plugins) --> + <!-- We need to load Jquery in the header for custom google analytics event tracking--> + <script src="/js/jquery.min.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!-- WARNING: Respond.js doesn't work if you view the page via file:// --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + </head> + <body> + + + <!-- Main content. --> + <div class="container"> + <div class="row"> + + + <div id="sidebar" class="col-sm-3"> + + +<!-- Top navbar. --> + <nav class="navbar navbar-default"> + <!-- The logo. --> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <div class="navbar-logo"> + <a href="/"> + <img alt="Apache Flink" src="/img/flink-header-logo.svg" width="147px" height="73px"> + </a> + </div> + </div><!-- /.navbar-header --> + + <!-- The navigation links. --> + <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1"> + <ul class="nav navbar-nav navbar-main"> + + <!-- First menu section explains visitors what Flink is --> + + <!-- What is Stream Processing? --> + <!-- + <li><a href="/streamprocessing1.html">What is Stream Processing?</a></li> + --> + + <!-- What is Flink? --> + <li><a href="/flink-architecture.html">What is Apache Flink?</a></li> + + + <ul class="nav navbar-nav navbar-subnav"> + <li > + <a href="/flink-architecture.html">Architecture</a> + </li> + <li > + <a href="/flink-applications.html">Applications</a> + </li> + <li > + <a href="/flink-operations.html">Operations</a> + </li> + </ul> + + + <!-- Use cases --> + <li><a href="/usecases.html">Use Cases</a></li> + + <!-- Powered by --> + <li><a href="/poweredby.html">Powered By</a></li> + + + + <!-- Second menu section aims to support Flink users --> + + <!-- Downloads --> + <li><a href="/downloads.html">Downloads</a></li> + + <!-- Getting Started --> + <li> + <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.10/getting-started/index.html" target="_blank">Getting Started <small><span class="glyphicon glyphicon-new-window"></span></small></a> + </li> + + <!-- Documentation --> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Documentation<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.10" target="_blank">1.10 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + <li><a href="https://ci.apache.org/projects/flink/flink-docs-master" target="_blank">Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + </ul> + </li> + + <!-- getting help --> + <li><a href="/gettinghelp.html">Getting Help</a></li> + + <!-- Blog --> + <li class="active"><a href="/blog/"><b>Flink Blog</b></a></li> + + + <!-- Flink-packages --> + <li> + <a href="https://flink-packages.org" target="_blank">flink-packages.org <small><span class="glyphicon glyphicon-new-window"></span></small></a> + </li> + + + <!-- Third menu section aim to support community and contributors --> + + <!-- Community --> + <li><a href="/community.html">Community & Project Info</a></li> + + <!-- Roadmap --> + <li><a href="/roadmap.html">Roadmap</a></li> + + <!-- Contribute --> + <li><a href="/contributing/how-to-contribute.html">How to Contribute</a></li> + + + <!-- GitHub --> + <li> + <a href="https://github.com/apache/flink" target="_blank">Flink on GitHub <small><span class="glyphicon glyphicon-new-window"></span></small></a> + </li> + + + + <!-- Language Switcher --> + <li> + + + <!-- link to the Chinese home page when current is blog page --> + <a href="/zh">中文版</a> + + + </li> + + </ul> + + <ul class="nav navbar-nav navbar-bottom"> + <hr /> + + <!-- Twitter --> + <li><a href="https://twitter.com/apacheflink" target="_blank">@ApacheFlink <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + + <!-- Visualizer --> + <li class=" hidden-md hidden-sm"><a href="/visualizer/" target="_blank">Plan Visualizer <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + + <hr /> + + <li><a href="https://apache.org" target="_blank">Apache Software Foundation <small><span class="glyphicon glyphicon-new-window"></span></small></a></li> + + <li> + <style> + .smalllinks:link { + display: inline-block !important; background: none; padding-top: 0px; padding-bottom: 0px; padding-right: 0px; min-width: 75px; + } + </style> + + <a class="smalllinks" href="https://www.apache.org/licenses/" target="_blank">License</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + + <a class="smalllinks" href="https://www.apache.org/security/" target="_blank">Security</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + + <a class="smalllinks" href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Donate</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + + <a class="smalllinks" href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a> <small><span class="glyphicon glyphicon-new-window"></span></small> + </li> + + </ul> + </div><!-- /.navbar-collapse --> + </nav> + + </div> + <div class="col-sm-9"> + <div class="row-fluid"> + <div class="col-sm-12"> + <div class="row"> + <h1>Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</h1> + + <article> + <p>26 Mar 2020 Bowen Li (<a href="https://twitter.com/Bowen__Li">@Bowen__Li</a>)</p> + +<p>In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse.</p> + +<div class="page-toc"> +<ul id="markdown-toc"> + <li><a href="#introduction" id="markdown-toc-introduction">Introduction</a></li> + <li><a href="#flink-and-its-integration-with-hive-comes-into-the-scene" id="markdown-toc-flink-and-its-integration-with-hive-comes-into-the-scene">Flink and Its Integration With Hive Comes into the Scene</a> <ul> + <li><a href="#unified-metadata-management" id="markdown-toc-unified-metadata-management">Unified Metadata Management</a></li> + <li><a href="#stream-processing" id="markdown-toc-stream-processing">Stream Processing</a></li> + <li><a href="#compatible-with-more-hive-versions" id="markdown-toc-compatible-with-more-hive-versions">Compatible with More Hive Versions</a></li> + <li><a href="#reuse-hive-user-defined-functions-udfs" id="markdown-toc-reuse-hive-user-defined-functions-udfs">Reuse Hive User Defined Functions (UDFs)</a></li> + <li><a href="#enhanced-read-and-write-on-hive-data" id="markdown-toc-enhanced-read-and-write-on-hive-data">Enhanced Read and Write on Hive Data</a></li> + <li><a href="#formats" id="markdown-toc-formats">Formats</a></li> + <li><a href="#more-data-types" id="markdown-toc-more-data-types">More Data Types</a></li> + <li><a href="#roadmap" id="markdown-toc-roadmap">Roadmap</a></li> + </ul> + </li> + <li><a href="#summary" id="markdown-toc-summary">Summary</a></li> +</ul> + +</div> + +<h2 id="introduction">Introduction</h2> + +<p>What are some of the latest requirements for your data warehouse and data infrastructure in 2020?</p> + +<p>We’ve came up with some for you.</p> + +<p>Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics. People become less and less tolerant of delays between when data is generated and when it arrives at their hands, ready to use. Hours or even days of delay is not acceptable anymore. Users are expecting minutes, or even seconds, of end-to-end latency for data in their warehouse, to get quicker-th [...] + +<p>Secondly, the infrastructure should be able to handle both offline batch data for offline analytics and exploration, and online streaming data for more timely analytics. Both are indispensable as they both have very valid use cases. Apart from the real time processing mentioned above, batch processing would still exist as it’s good for ad hoc queries and explorations, and full-size calculations. Your modern infrastructure should not force users to choose between one or the other, it s [...] + +<p>Thirdly, the data players, including data engineers, data scientists, analysts, and operations, urge a more unified infrastructure than ever before for easier ramp-up and higher working efficiency. The big data landscape has been fragmented for years - companies may have one set of infrastructure for real time processing, one set for batch, one set for OLAP, etc. That, oftentimes, comes as a result of the legacy of lambda architecture, which was popular in the era when stream processo [...] + +<p>If any of these resonate with you, you just found the right post to read: we have never been this close to the vision by strengthening Flink’s integration with Hive to a production grade.</p> + +<h2 id="flink-and-its-integration-with-hive-comes-into-the-scene">Flink and Its Integration With Hive Comes into the Scene</h2> + +<p>Apache Flink has been a proven scalable system to handle extremely high workload of streaming data in super low latency in many giant tech companies.</p> + +<p>Despite its huge success in the real time processing domain, at its deep root, Flink has been faithfully following its inborn philosophy of being <a href="https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html">a unified data processing engine for both batch and streaming</a>, and taking a streaming-first approach in its architecture to do batch processing. By making batch a special case for streaming, Flink really leverages its cutting edge streaming capabilities [...] + +<p>On the other hand, Apache Hive has established itself as a focal point of the data warehousing ecosystem. It serves as not only a SQL engine for big data analytics and ETL, but also a data management platform, where data is discovered and defined. As business evolves, it puts new requirements on data warehouse.</p> + +<p>Thus we started integrating Flink and Hive as a beta version in Flink 1.9. Over the past few months, we have been listening to users’ requests and feedback, extensively enhancing our product, and running rigorous benchmarks (which will be published soon separately). I’m glad to announce that the integration between Flink and Hive is at production grade in <a href="https://flink.apache.org/news/2020/02/11/release-1.10.0.html">Flink 1.10</a> and we can’t wait to walk you through the det [...] + +<h3 id="unified-metadata-management">Unified Metadata Management</h3> + +<p>Hive Metastore has evolved into the de facto metadata hub over the years in the Hadoop, or even the cloud, ecosystem. Many companies have a single Hive Metastore service instance in production to manage all of their schemas, either Hive or non-Hive metadata, as the single source of truth.</p> + +<p>In 1.9 we introduced Flink’s <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_catalog.html">HiveCatalog</a>, connecting Flink to users’ rich metadata pool. The meaning of <code>HiveCatalog</code> is two-fold here. First, it allows Apache Flink users to utilize Hive Metastore to store and manage Flink’s metadata, including tables, UDFs, and statistics of data. Second, it enables Flink to access Hive’s existing metadata, so that Flink itself can [...] + +<p>In Flink 1.10, users can store Flink’s own tables, views, UDFs, statistics in Hive Metastore on all of the compatible Hive versions mentioned above. <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_catalog.html#example">Here’s an end-to-end example</a> of how to store a Flink’s Kafka source table in Hive Metastore and later query the table in Flink SQL.</p> + +<h3 id="stream-processing">Stream Processing</h3> + +<p>The Hive integration feature in Flink 1.10 empowers users to re-imagine what they can accomplish with their Hive data and unlock stream processing use cases:</p> + +<ul> + <li>join real-time streaming data in Flink with offline Hive data for more complex data processing</li> + <li>backfill Hive data with Flink directly in a unified fashion</li> + <li>leverage Flink to move real-time data into Hive more quickly, greatly shortening the end-to-end latency between when data is generated and when it arrives at your data warehouse for analytics, from hours — or even days — to minutes</li> +</ul> + +<h3 id="compatible-with-more-hive-versions">Compatible with More Hive Versions</h3> + +<p>In Flink 1.10, we brought full coverage to most Hive versions including 1.0, 1.1, 1.2, 2.0, 2.1, 2.2, 2.3, and 3.1. Take a look <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/#supported-hive-versions">here</a>.</p> + +<h3 id="reuse-hive-user-defined-functions-udfs">Reuse Hive User Defined Functions (UDFs)</h3> + +<p>Users can <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_functions.html#hive-user-defined-functions">reuse all kinds of Hive UDFs in Flink</a> since Flink 1.9.</p> + +<p>This is a great win for Flink users with past history with the Hive ecosystem, as they may have developed custom business logic in their Hive UDFs. Being able to run these functions without any rewrite saves users a lot of time and brings them a much smoother experience when they migrate to Flink.</p> + +<p>To take it a step further, Flink 1.10 introduces <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_functions.html#use-hive-built-in-functions-via-hivemodule">compatibility of Hive built-in functions via HiveModule</a>. Over the years, the Hive community has developed a few hundreds of built-in functions that are super handy for users. For those built-in functions that don’t exist in Flink yet, users are now able to leverage the existing Hive bui [...] + +<h3 id="enhanced-read-and-write-on-hive-data">Enhanced Read and Write on Hive Data</h3> + +<p>Flink 1.10 extends its read and write capabilities on Hive data to all the common use cases with better performance.</p> + +<p>On the reading side, Flink now can read Hive regular tables, partitioned tables, and views. Lots of optimization techniques are developed around reading, including partition pruning and projection pushdown to transport less data from file storage, limit pushdown for faster experiment and exploration, and vectorized reader for ORC files.</p> + +<p>On the writing side, Flink 1.10 introduces “INSERT INTO” and “INSERT OVERWRITE” to its syntax, and can write to not only Hive’s regular tables, but also partitioned tables with either static or dynamic partitions.</p> + +<h3 id="formats">Formats</h3> + +<p>Your engine should be able to handle all common types of file formats to give you the freedom of choosing one over another in order to fit your business needs. It’s no exception for Flink. We have tested the following table storage formats: text, csv, SequenceFile, ORC, and Parquet.</p> + +<h3 id="more-data-types">More Data Types</h3> + +<p>In Flink 1.10, we added support for a few more frequently-used Hive data types that were not covered by Flink 1.9. Flink users now should have a full, smooth experience to query and manipulate Hive data from Flink.</p> + +<h3 id="roadmap">Roadmap</h3> + +<p>Integration between any two systems is a never-ending story.</p> + +<p>We are constantly improving Flink itself and the Flink-Hive integration also gets improved by collecting user feedback and working with folks in this vibrant community.</p> + +<p>After careful consideration and prioritization of the feedback we received, we have prioritize many of the below requests for the next Flink release of 1.11.</p> + +<ul> + <li>Hive streaming sink so that Flink can stream data into Hive tables, bringing a real streaming experience to Hive</li> + <li>Native Parquet reader for better performance</li> + <li>Additional interoperability - support creating Hive tables, views, functions in Flink</li> + <li>Better out-of-box experience with built-in dependencies, including documentations</li> + <li>JDBC driver so that users can reuse their existing toolings to run SQL jobs on Flink</li> + <li>Hive syntax and semantic compatible mode</li> +</ul> + +<p>If you have more feature requests or discover bugs, please reach out to the community through mailing list and JIRAs.</p> + +<h2 id="summary">Summary</h2> + +<p>Data warehousing is shifting to a more real-time fashion, and Apache Flink can make a difference for your organization in this space.</p> + +<p>Flink 1.10 brings production-ready Hive integration and empowers users to achieve more in both metadata management and unified/batch data processing.</p> + +<p>We encourage all our users to get their hands on Flink 1.10. You are very welcome to join the community in development, discussions, and all other kinds of collaborations in this topic.</p> + + + </article> + </div> + + <div class="row"> + <div id="disqus_thread"></div> + <script type="text/javascript"> + /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ + var disqus_shortname = 'stratosphere-eu'; // required: replace example with your forum shortname + + /* * * DON'T EDIT BELOW THIS LINE * * */ + (function() { + var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; + dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; + (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); + })(); + </script> + </div> + </div> +</div> + </div> + </div> + + <hr /> + + <div class="row"> + <div class="footer text-center col-sm-12"> + <p>Copyright © 2014-2019 <a href="http://apache.org">The Apache Software Foundation</a>. All Rights Reserved.</p> + <p>Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.</p> + <p><a href="/privacy-policy.html">Privacy Policy</a> · <a href="/blog/feed.xml">RSS feed</a></p> + </div> + </div> + </div><!-- /.container --> + + <!-- Include all compiled plugins (below), or include individual files as needed --> + <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script> + <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.matchHeight/0.7.0/jquery.matchHeight-min.js"></script> + <script src="/js/codetabs.js"></script> + <script src="/js/stickysidebar.js"></script> + + <!-- Google Analytics --> + <script> + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + + ga('create', 'UA-52545728-1', 'auto'); + ga('send', 'pageview'); + </script> + </body> +</html> diff --git a/content/index.html b/content/index.html index d994ff4..4a3e52d 100644 --- a/content/index.html +++ b/content/index.html @@ -557,6 +557,11 @@ <dl> + <dt> <a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></dt> + <dd><p>In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse.</p> + +</dd> + <dt> <a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></dt> <dd>In this series of blog posts you will learn about powerful Flink patterns for building streaming applications.</dd> @@ -570,9 +575,6 @@ <dd><p>The Apache Flink community is excited to hit the double digits and announce the release of Flink 1.10.0! As a result of the biggest community effort to date, with over 1.2k issues implemented and more than 200 contributors, this release introduces significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and great advances in Python support (PyFlink).</p> </dd> - - <dt> <a href="/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html">A Guide for Unit Testing in Apache Flink</a></dt> - <dd>This post provides a detailed guide for unit testing of Apache Flink applications.</dd> </dl> diff --git a/content/zh/index.html b/content/zh/index.html index 0a209bf..cec1ba7 100644 --- a/content/zh/index.html +++ b/content/zh/index.html @@ -554,6 +554,11 @@ <dl> + <dt> <a href="/features/2020/03/26/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></dt> + <dd><p>In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse.</p> + +</dd> + <dt> <a href="/news/2020/03/24/demo-fraud-detection-2.html">Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</a></dt> <dd>In this series of blog posts you will learn about powerful Flink patterns for building streaming applications.</dd> @@ -567,9 +572,6 @@ <dd><p>The Apache Flink community is excited to hit the double digits and announce the release of Flink 1.10.0! As a result of the biggest community effort to date, with over 1.2k issues implemented and more than 200 contributors, this release introduces significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and great advances in Python support (PyFlink).</p> </dd> - - <dt> <a href="/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html">A Guide for Unit Testing in Apache Flink</a></dt> - <dd>This post provides a detailed guide for unit testing of Apache Flink applications.</dd> </dl>
