wuchong commented on code in PR #531: URL: https://github.com/apache/flink-web/pull/531#discussion_r867366040
########## _posts/2022-05-01-release-table-store-0.1.0.md: ########## @@ -0,0 +1,110 @@ +--- +layout: post +title: "Apache Flink Table Store 0.1.0 Release Announcement" +subtitle: "Unified streaming and batch store for building dynamic tables on Apache Flink." +date: 2022-05-01T08:00:00.000Z +categories: news +authors: +- Jingsong Lee: + name: "Jingsong Lee" + +--- + +The Apache Flink community is pleased to announce the preview release of the +[Apache Flink Table Store](https://github.com/apache/flink-table-store) (0.1.0). + +Flink Table Store is a unified streaming and batch store for building dynamic tables +on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for high speed +and a large amount of data update & query capability. + +Please check out the full [documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for detailed information and user guides. + +Note: Flink Table Store is still in beta status and undergoing rapid development, +we do not recommend that you use it directly in a production environment. + +## What is Flink Table Store + +Open [Flink official website](https://flink.apache.org/), you can see the following line: +`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on distributed computing, +which brings real-time big data computing. Users need to combine Flink with some kind of external storage. + +The message queue will be used in both source & intermediate stages in streaming pipeline, to guarantee the +latency stay within seconds. There will also be a real-time OLAP system receiving processed data in streaming +fashion and serving user’s ad-hoc queries. + +Everything works fine as long as users only care about the aggregated results. But when users start to care +about the intermediate data, they will immediately hit a blocker: Intermediate kafka tables are not queryable. + +Therefore, users use multiple systems. Writing to a lake store like Apache Hudi, Apache Iceberg while writing to Queue, +the lake store keeps historical data at a lower cost. + +There are two main issues with doing this: +- High understanding bar for users: It’s also not easy for users to understand all the SQL connectors, + learn the capabilities and restrictions for each of those. Users may also want to play around with + streaming & batch unification, but don't really know how, given the connectors are most of the time different + in batch and streaming use cases. +- Increasing architecture complexity: It’s hard to choose the most suited external systems when the requirements + include streaming pipelines, offline batch jobs, ad-hoc queries. Multiple systems will increase the operation + and maintenance complexity. Users at least need to coordinate between the queue system and file system of each + table, which is error-prone. + +The Flink Table Store aims to provide a unified storage abstraction: +- Table Store provides storage of historical data while providing queue abstraction. +- Table Store provides competitive historical storage with lake storage capability, using LSM file structure + to store data on DFS, providing real-time updates and queries at a lower cost. Review Comment: It sounds like Table Store only supports storing data on DFS and doesn't support object storage. ########## _posts/2022-05-01-release-table-store-0.1.0.md: ########## @@ -0,0 +1,110 @@ +--- +layout: post +title: "Apache Flink Table Store 0.1.0 Release Announcement" +subtitle: "Unified streaming and batch store for building dynamic tables on Apache Flink." +date: 2022-05-01T08:00:00.000Z +categories: news +authors: +- Jingsong Lee: + name: "Jingsong Lee" + +--- + +The Apache Flink community is pleased to announce the preview release of the +[Apache Flink Table Store](https://github.com/apache/flink-table-store) (0.1.0). + +Flink Table Store is a unified streaming and batch store for building dynamic tables +on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for high speed +and a large amount of data update & query capability. + +Please check out the full [documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for detailed information and user guides. + +Note: Flink Table Store is still in beta status and undergoing rapid development, +we do not recommend that you use it directly in a production environment. + +## What is Flink Table Store + +Open [Flink official website](https://flink.apache.org/), you can see the following line: +`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on distributed computing, +which brings real-time big data computing. Users need to combine Flink with some kind of external storage. + +The message queue will be used in both source & intermediate stages in streaming pipeline, to guarantee the +latency stay within seconds. There will also be a real-time OLAP system receiving processed data in streaming +fashion and serving user’s ad-hoc queries. + +Everything works fine as long as users only care about the aggregated results. But when users start to care +about the intermediate data, they will immediately hit a blocker: Intermediate kafka tables are not queryable. + +Therefore, users use multiple systems. Writing to a lake store like Apache Hudi, Apache Iceberg while writing to Queue, +the lake store keeps historical data at a lower cost. + +There are two main issues with doing this: +- High understanding bar for users: It’s also not easy for users to understand all the SQL connectors, + learn the capabilities and restrictions for each of those. Users may also want to play around with + streaming & batch unification, but don't really know how, given the connectors are most of the time different + in batch and streaming use cases. +- Increasing architecture complexity: It’s hard to choose the most suited external systems when the requirements + include streaming pipelines, offline batch jobs, ad-hoc queries. Multiple systems will increase the operation + and maintenance complexity. Users at least need to coordinate between the queue system and file system of each + table, which is error-prone. Review Comment: I also have the same feeling the announcement misses a picture to explain the position and capability of the table store. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
