[GitHub] [flink-web] wuchong commented on a diff in pull request #531: Add Table Store 0.1.0 release

GitBox Sat, 07 May 2022 08:58:10 -0700


wuchong commented on code in PR #531:
URL: https://github.com/apache/flink-web/pull/531#discussion_r867366040



##########
_posts/2022-05-01-release-table-store-0.1.0.md:
##########
@@ -0,0 +1,110 @@
+---
+layout: post
+title:  "Apache Flink Table Store 0.1.0 Release Announcement"
+subtitle: "Unified streaming and batch store for building dynamic tables on 
Apache Flink."
+date: 2022-05-01T08:00:00.000Z
+categories: news
+authors:
+- Jingsong Lee:
+  name: "Jingsong Lee"
+
+---
+
+The Apache Flink community is pleased to announce the preview release of the
+[Apache Flink Table Store](https://github.com/apache/flink-table-store) 
(0.1.0).
+
+Flink Table Store is a unified streaming and batch store for building dynamic 
tables
+on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for 
high speed
+and a large amount of data update & query capability.
+
+Please check out the full 
[documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for 
detailed information and user guides.
+
+Note: Flink Table Store is still in beta status and undergoing rapid 
development,
+we do not recommend that you use it directly in a production environment.
+
+## What is Flink Table Store
+
+Open [Flink official website](https://flink.apache.org/), you can see the 
following line:
+`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on 
distributed computing,
+which brings real-time big data computing. Users need to combine Flink with 
some kind of external storage.
+
+The message queue will be used in both source & intermediate stages in 
streaming pipeline, to guarantee the
+latency stay within seconds. There will also be a real-time OLAP system 
receiving processed data in streaming
+fashion and serving user’s ad-hoc queries.
+
+Everything works fine as long as users only care about the aggregated results. 
But when users start to care
+about the intermediate data, they will immediately hit a blocker: Intermediate 
kafka tables are not queryable.
+
+Therefore, users use multiple systems. Writing to a lake store like Apache 
Hudi, Apache Iceberg while writing to Queue,
+the lake store keeps historical data at a lower cost.
+
+There are two main issues with doing this:
+- High understanding bar for users: It’s also not easy for users to understand 
all the SQL connectors,
+  learn the capabilities and restrictions for each of those. Users may also 
want to play around with
+  streaming & batch unification, but don't really know how, given the 
connectors are most of the time different
+  in batch and streaming use cases.
+- Increasing architecture complexity: It’s hard to choose the most suited 
external systems when the requirements
+  include streaming pipelines, offline batch jobs, ad-hoc queries. Multiple 
systems will increase the operation
+  and maintenance complexity. Users at least need to coordinate between the 
queue system and file system of each
+  table, which is error-prone.
+
+The Flink Table Store aims to provide a unified storage abstraction:
+- Table Store provides storage of historical data while providing queue 
abstraction.
+- Table Store provides competitive historical storage with lake storage 
capability, using LSM file structure
+  to store data on DFS, providing real-time updates and queries at a lower 
cost.

Review Comment:
   It sounds like Table Store only supports storing data on DFS and doesn't 
support object storage. 



##########
_posts/2022-05-01-release-table-store-0.1.0.md:
##########
@@ -0,0 +1,110 @@
+---
+layout: post
+title:  "Apache Flink Table Store 0.1.0 Release Announcement"
+subtitle: "Unified streaming and batch store for building dynamic tables on 
Apache Flink."
+date: 2022-05-01T08:00:00.000Z
+categories: news
+authors:
+- Jingsong Lee:
+  name: "Jingsong Lee"
+
+---
+
+The Apache Flink community is pleased to announce the preview release of the
+[Apache Flink Table Store](https://github.com/apache/flink-table-store) 
(0.1.0).
+
+Flink Table Store is a unified streaming and batch store for building dynamic 
tables
+on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for 
high speed
+and a large amount of data update & query capability.
+
+Please check out the full 
[documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for 
detailed information and user guides.
+
+Note: Flink Table Store is still in beta status and undergoing rapid 
development,
+we do not recommend that you use it directly in a production environment.
+
+## What is Flink Table Store
+
+Open [Flink official website](https://flink.apache.org/), you can see the 
following line:
+`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on 
distributed computing,
+which brings real-time big data computing. Users need to combine Flink with 
some kind of external storage.
+
+The message queue will be used in both source & intermediate stages in 
streaming pipeline, to guarantee the
+latency stay within seconds. There will also be a real-time OLAP system 
receiving processed data in streaming
+fashion and serving user’s ad-hoc queries.
+
+Everything works fine as long as users only care about the aggregated results. 
But when users start to care
+about the intermediate data, they will immediately hit a blocker: Intermediate 
kafka tables are not queryable.
+
+Therefore, users use multiple systems. Writing to a lake store like Apache 
Hudi, Apache Iceberg while writing to Queue,
+the lake store keeps historical data at a lower cost.
+
+There are two main issues with doing this:
+- High understanding bar for users: It’s also not easy for users to understand 
all the SQL connectors,
+  learn the capabilities and restrictions for each of those. Users may also 
want to play around with
+  streaming & batch unification, but don't really know how, given the 
connectors are most of the time different
+  in batch and streaming use cases.
+- Increasing architecture complexity: It’s hard to choose the most suited 
external systems when the requirements
+  include streaming pipelines, offline batch jobs, ad-hoc queries. Multiple 
systems will increase the operation
+  and maintenance complexity. Users at least need to coordinate between the 
queue system and file system of each
+  table, which is error-prone.

Review Comment:
   I also have the same feeling the announcement misses a picture to explain 
the position and capability of the table store. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-web] wuchong commented on a diff in pull request #531: Add Table Store 0.1.0 release

Reply via email to