openinx commented on code in PR #531: URL: https://github.com/apache/flink-web/pull/531#discussion_r867594000
########## _posts/2022-05-01-release-table-store-0.1.0.md: ########## @@ -0,0 +1,110 @@ +--- +layout: post +title: "Apache Flink Table Store 0.1.0 Release Announcement" +subtitle: "Unified streaming and batch store for building dynamic tables on Apache Flink." +date: 2022-05-01T08:00:00.000Z +categories: news +authors: +- Jingsong Lee: + name: "Jingsong Lee" + +--- + +The Apache Flink community is pleased to announce the preview release of the +[Apache Flink Table Store](https://github.com/apache/flink-table-store) (0.1.0). + +Flink Table Store is a unified streaming and batch store for building dynamic tables +on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for high speed +and a large amount of data update & query capability. + +Please check out the full [documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for detailed information and user guides. + +Note: Flink Table Store is still in beta status and undergoing rapid development, +we do not recommend that you use it directly in a production environment. + +## What is Flink Table Store + +Open [Flink official website](https://flink.apache.org/), you can see the following line: +`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on distributed computing, +which brings real-time big data computing. Users need to combine Flink with some kind of external storage. + +The message queue will be used in both source & intermediate stages in streaming pipeline, to guarantee the +latency stay within seconds. There will also be a real-time OLAP system receiving processed data in streaming +fashion and serving user’s ad-hoc queries. + +Everything works fine as long as users only care about the aggregated results. But when users start to care +about the intermediate data, they will immediately hit a blocker: Intermediate kafka tables are not queryable. + +Therefore, users use multiple systems. Writing to a lake store like Apache Hudi, Apache Iceberg while writing to Queue, +the lake store keeps historical data at a lower cost. + +There are two main issues with doing this: Review Comment: I think there is another critical benefits which is not mentioned. In the newly introduced flink unified table storage. We unified both the realtime physical data set and batch offline physical data set into a single physical data set, which means we have to separate the realtime data from the batch offline data because they usually don't have the same table format and we don't have the practical approach to flush those realtime serving data into the batch offline table format in the old architecture. In the new architecture, we are trying to share the same data set for both realtime data and batch offline data, and the realtime will be flushed to batch offline data automatically as time advance. The batch offline data can also be accelerated for the OLAP query. The key benefit in the new architecture is: we don't need to maintain two different data set for realtime OLAP query and batch offline query, and people will save lots of ETL processing to transform data between those two kinds of data warehouse. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
