[GitHub] [flink-web] openinx commented on a diff in pull request #531: Add Table Store 0.1.0 release

GitBox Sun, 08 May 2022 19:42:30 -0700


openinx commented on code in PR #531:
URL: https://github.com/apache/flink-web/pull/531#discussion_r867594000



##########
_posts/2022-05-01-release-table-store-0.1.0.md:
##########
@@ -0,0 +1,110 @@
+---
+layout: post
+title:  "Apache Flink Table Store 0.1.0 Release Announcement"
+subtitle: "Unified streaming and batch store for building dynamic tables on 
Apache Flink."
+date: 2022-05-01T08:00:00.000Z
+categories: news
+authors:
+- Jingsong Lee:
+  name: "Jingsong Lee"
+
+---
+
+The Apache Flink community is pleased to announce the preview release of the
+[Apache Flink Table Store](https://github.com/apache/flink-table-store) 
(0.1.0).
+
+Flink Table Store is a unified streaming and batch store for building dynamic 
tables
+on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for 
high speed
+and a large amount of data update & query capability.
+
+Please check out the full 
[documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for 
detailed information and user guides.
+
+Note: Flink Table Store is still in beta status and undergoing rapid 
development,
+we do not recommend that you use it directly in a production environment.
+
+## What is Flink Table Store
+
+Open [Flink official website](https://flink.apache.org/), you can see the 
following line:
+`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on 
distributed computing,
+which brings real-time big data computing. Users need to combine Flink with 
some kind of external storage.
+
+The message queue will be used in both source & intermediate stages in 
streaming pipeline, to guarantee the
+latency stay within seconds. There will also be a real-time OLAP system 
receiving processed data in streaming
+fashion and serving user’s ad-hoc queries.
+
+Everything works fine as long as users only care about the aggregated results. 
But when users start to care
+about the intermediate data, they will immediately hit a blocker: Intermediate 
kafka tables are not queryable.
+
+Therefore, users use multiple systems. Writing to a lake store like Apache 
Hudi, Apache Iceberg while writing to Queue,
+the lake store keeps historical data at a lower cost.
+
+There are two main issues with doing this:

Review Comment:
   I think there is another critical benefits which is not mentioned.   In the 
newly introduced flink unified table storage.  We unified both the realtime 
physical data set and batch offline physical data set into a single physical 
data set, which means we have to separate the realtime data from the batch 
offline data because they usually don't have the same table format and we don't 
have the practical approach to flush those realtime serving data into the batch 
offline table format in the old architecture. 
   
   In the new architecture, we are trying to share the same data set for both 
realtime data and batch offline data, and the realtime will be flushed to batch 
offline data automatically as time advance.  The batch offline data can also be 
accelerated for the OLAP query. The key benefit in the new architecture is:  we 
don't need to maintain two different data set for realtime OLAP query and batch 
offline query,  and people will save lots of ETL processing to transform data 
between those two kinds of data warehouse.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-web] openinx commented on a diff in pull request #531: Add Table Store 0.1.0 release

Reply via email to