[GitHub] [flink-web] carp84 commented on a diff in pull request #531: Add Table Store 0.1.0 release

GitBox Sat, 07 May 2022 00:56:07 -0700


carp84 commented on code in PR #531:
URL: https://github.com/apache/flink-web/pull/531#discussion_r867320198



##########
_posts/2022-05-01-release-table-store-0.1.0.md:
##########
@@ -0,0 +1,110 @@
+---
+layout: post
+title:  "Apache Flink Table Store 0.1.0 Release Announcement"
+subtitle: "Unified streaming and batch store for building dynamic tables on 
Apache Flink."
+date: 2022-05-01T08:00:00.000Z
+categories: news
+authors:
+- Jingsong Lee:
+  name: "Jingsong Lee"
+
+---
+
+The Apache Flink community is pleased to announce the preview release of the
+[Apache Flink Table Store](https://github.com/apache/flink-table-store) 
(0.1.0).
+
+Flink Table Store is a unified streaming and batch store for building dynamic 
tables
+on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for 
high speed
+and a large amount of data update & query capability.
+
+Please check out the full 
[documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for 
detailed information and user guides.
+
+Note: Flink Table Store is still in beta status and undergoing rapid 
development,
+we do not recommend that you use it directly in a production environment.
+
+## What is Flink Table Store
+
+Open [Flink official website](https://flink.apache.org/), you can see the 
following line:
+`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on 
distributed computing,
+which brings real-time big data computing. Users need to combine Flink with 
some kind of external storage.
+
+The message queue will be used in both source & intermediate stages in 
streaming pipeline, to guarantee the
+latency stay within seconds. There will also be a real-time OLAP system 
receiving processed data in streaming
+fashion and serving user’s ad-hoc queries.
+
+Everything works fine as long as users only care about the aggregated results. 
But when users start to care
+about the intermediate data, they will immediately hit a blocker: Intermediate 
kafka tables are not queryable.
+
+Therefore, users use multiple systems. Writing to a lake store like Apache 
Hudi, Apache Iceberg while writing to Queue,
+the lake store keeps historical data at a lower cost.
+
+There are two main issues with doing this:
+- High understanding bar for users: It’s also not easy for users to understand 
all the SQL connectors,
+  learn the capabilities and restrictions for each of those. Users may also 
want to play around with
+  streaming & batch unification, but don't really know how, given the 
connectors are most of the time different
+  in batch and streaming use cases.
+- Increasing architecture complexity: It’s hard to choose the most suited 
external systems when the requirements
+  include streaming pipelines, offline batch jobs, ad-hoc queries. Multiple 
systems will increase the operation
+  and maintenance complexity. Users at least need to coordinate between the 
queue system and file system of each
+  table, which is error-prone.
+
+The Flink Table Store aims to provide a unified storage abstraction:
+- Table Store provides storage of historical data while providing queue 
abstraction.
+- Table Store provides competitive historical storage with lake storage 
capability, using LSM file structure
+  to store data on DFS, providing real-time updates and queries at a lower 
cost.
+- Table Store coordinates between the queue storage and historical storage, 
providing hybrid read and write capabilities.
+- Table Store is a storage created for Flink, which satisfies all the concepts 
of Flink SQL and is the most
+  suitable storage abstraction for Flink.
+
+## Core Features
+
+Flink Table Store supports the following usage:
+- **Streaming Insert**: Write changelog streams, including CDC from the 
database and streams.
+- **Batch Insert**: Write batch data as offline warehouse, including OVERWRITE 
support.
+- **Batch/OLAP Query**: Read the snapshot of the storage, efficient querying 
of real-time data.
+- **Streaming Query**: Read the storage changes, ensure exactly-once 
consistency.
+
+Flink Table Store uses the following technologies to support the above user 
usages:
+- Hybrid Storage: Integrating Apache Kafka to achieve real-time stream 
computation.
+- LSM Structure: For a large amount of data updates and high performance 
queries.
+- Columnar File Format: Use Apache ORC to support efficient querying.
+- Lake Storage: Metadata and data on DFS and Object Store.
+
+Many thanks for the inspiration of the following systems: [Apache 
Iceberg](https://iceberg.apache.org/) and [RocksDB](http://rocksdb.org/).
+
+## Getting started
+
+For a detailed [getting started 
guide]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/docs/try-table-store/quick-start/)
 please check the documentation site.

Review Comment:
   ```suggestion
   Please refer to the [getting started 
guide]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/docs/try-table-store/quick-start/)
 for more details.
   ```



##########
_posts/2022-05-01-release-table-store-0.1.0.md:
##########
@@ -0,0 +1,110 @@
+---
+layout: post
+title:  "Apache Flink Table Store 0.1.0 Release Announcement"
+subtitle: "Unified streaming and batch store for building dynamic tables on 
Apache Flink."
+date: 2022-05-01T08:00:00.000Z
+categories: news
+authors:
+- Jingsong Lee:
+  name: "Jingsong Lee"
+
+---
+
+The Apache Flink community is pleased to announce the preview release of the
+[Apache Flink Table Store](https://github.com/apache/flink-table-store) 
(0.1.0).
+
+Flink Table Store is a unified streaming and batch store for building dynamic 
tables
+on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for 
high speed
+and a large amount of data update & query capability.

Review Comment:
   The description of "unified streaming and batch store" sounds a little bit 
odd to me, and talking about data structure (LSM-tree) is too detailed. How 
about changing into something like "Flink Table Store is for building dynamic 
tables for both stream and batch processing in Flink, supporting high speed 
data ingestion and timely data query"?



##########
_posts/2022-05-01-release-table-store-0.1.0.md:
##########
@@ -0,0 +1,110 @@
+---
+layout: post
+title:  "Apache Flink Table Store 0.1.0 Release Announcement"
+subtitle: "Unified streaming and batch store for building dynamic tables on 
Apache Flink."
+date: 2022-05-01T08:00:00.000Z
+categories: news
+authors:
+- Jingsong Lee:
+  name: "Jingsong Lee"
+
+---
+
+The Apache Flink community is pleased to announce the preview release of the
+[Apache Flink Table Store](https://github.com/apache/flink-table-store) 
(0.1.0).
+
+Flink Table Store is a unified streaming and batch store for building dynamic 
tables
+on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for 
high speed
+and a large amount of data update & query capability.
+
+Please check out the full 
[documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for 
detailed information and user guides.
+
+Note: Flink Table Store is still in beta status and undergoing rapid 
development,
+we do not recommend that you use it directly in a production environment.
+
+## What is Flink Table Store
+
+Open [Flink official website](https://flink.apache.org/), you can see the 
following line:
+`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on 
distributed computing,
+which brings real-time big data computing. Users need to combine Flink with 
some kind of external storage.
+
+The message queue will be used in both source & intermediate stages in 
streaming pipeline, to guarantee the
+latency stay within seconds. There will also be a real-time OLAP system 
receiving processed data in streaming
+fashion and serving user’s ad-hoc queries.
+
+Everything works fine as long as users only care about the aggregated results. 
But when users start to care
+about the intermediate data, they will immediately hit a blocker: Intermediate 
kafka tables are not queryable.
+
+Therefore, users use multiple systems. Writing to a lake store like Apache 
Hudi, Apache Iceberg while writing to Queue,
+the lake store keeps historical data at a lower cost.
+
+There are two main issues with doing this:
+- High understanding bar for users: It’s also not easy for users to understand 
all the SQL connectors,
+  learn the capabilities and restrictions for each of those. Users may also 
want to play around with
+  streaming & batch unification, but don't really know how, given the 
connectors are most of the time different
+  in batch and streaming use cases.
+- Increasing architecture complexity: It’s hard to choose the most suited 
external systems when the requirements
+  include streaming pipelines, offline batch jobs, ad-hoc queries. Multiple 
systems will increase the operation
+  and maintenance complexity. Users at least need to coordinate between the 
queue system and file system of each
+  table, which is error-prone.
+
+The Flink Table Store aims to provide a unified storage abstraction:
+- Table Store provides storage of historical data while providing queue 
abstraction.
+- Table Store provides competitive historical storage with lake storage 
capability, using LSM file structure
+  to store data on DFS, providing real-time updates and queries at a lower 
cost.
+- Table Store coordinates between the queue storage and historical storage, 
providing hybrid read and write capabilities.
+- Table Store is a storage created for Flink, which satisfies all the concepts 
of Flink SQL and is the most
+  suitable storage abstraction for Flink.
+
+## Core Features
+
+Flink Table Store supports the following usage:
+- **Streaming Insert**: Write changelog streams, including CDC from the 
database and streams.
+- **Batch Insert**: Write batch data as offline warehouse, including OVERWRITE 
support.
+- **Batch/OLAP Query**: Read the snapshot of the storage, efficient querying 
of real-time data.
+- **Streaming Query**: Read the storage changes, ensure exactly-once 
consistency.
+
+Flink Table Store uses the following technologies to support the above user 
usages:
+- Hybrid Storage: Integrating Apache Kafka to achieve real-time stream 
computation.
+- LSM Structure: For a large amount of data updates and high performance 
queries.
+- Columnar File Format: Use Apache ORC to support efficient querying.
+- Lake Storage: Metadata and data on DFS and Object Store.

Review Comment:
   I wonder whether it's necessary to expose the implementation details in the 
release blog post, especially when the table store is still in a preview status 
and implementations may change in the future.
   
   OTOH, if we think it's still valuable to provide details here, I would 
suggest to add a "In this preview version" at the beginning of the paragraph.



##########
_posts/2022-05-01-release-table-store-0.1.0.md:
##########
@@ -0,0 +1,110 @@
+---
+layout: post
+title:  "Apache Flink Table Store 0.1.0 Release Announcement"
+subtitle: "Unified streaming and batch store for building dynamic tables on 
Apache Flink."
+date: 2022-05-01T08:00:00.000Z
+categories: news
+authors:
+- Jingsong Lee:
+  name: "Jingsong Lee"
+
+---
+
+The Apache Flink community is pleased to announce the preview release of the
+[Apache Flink Table Store](https://github.com/apache/flink-table-store) 
(0.1.0).
+
+Flink Table Store is a unified streaming and batch store for building dynamic 
tables
+on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for 
high speed
+and a large amount of data update & query capability.
+
+Please check out the full 
[documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for 
detailed information and user guides.
+
+Note: Flink Table Store is still in beta status and undergoing rapid 
development,
+we do not recommend that you use it directly in a production environment.
+
+## What is Flink Table Store
+
+Open [Flink official website](https://flink.apache.org/), you can see the 
following line:
+`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on 
distributed computing,
+which brings real-time big data computing. Users need to combine Flink with 
some kind of external storage.
+
+The message queue will be used in both source & intermediate stages in 
streaming pipeline, to guarantee the
+latency stay within seconds. There will also be a real-time OLAP system 
receiving processed data in streaming
+fashion and serving user’s ad-hoc queries.
+
+Everything works fine as long as users only care about the aggregated results. 
But when users start to care
+about the intermediate data, they will immediately hit a blocker: Intermediate 
kafka tables are not queryable.
+
+Therefore, users use multiple systems. Writing to a lake store like Apache 
Hudi, Apache Iceberg while writing to Queue,
+the lake store keeps historical data at a lower cost.
+
+There are two main issues with doing this:
+- High understanding bar for users: It’s also not easy for users to understand 
all the SQL connectors,
+  learn the capabilities and restrictions for each of those. Users may also 
want to play around with
+  streaming & batch unification, but don't really know how, given the 
connectors are most of the time different
+  in batch and streaming use cases.
+- Increasing architecture complexity: It’s hard to choose the most suited 
external systems when the requirements
+  include streaming pipelines, offline batch jobs, ad-hoc queries. Multiple 
systems will increase the operation
+  and maintenance complexity. Users at least need to coordinate between the 
queue system and file system of each
+  table, which is error-prone.

Review Comment:
   I share the same feeling, and maybe adding a picture to depict the (better, 
easier, cleaner - as indicated here) architecture with the flink table store 
solution could help readers to understand.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-web] carp84 commented on a diff in pull request #531: Add Table Store 0.1.0 release

Reply via email to