[GitHub] [flink-web] Myasuka commented on a diff in pull request #531: Add Table Store 0.1.0 release

GitBox Sat, 07 May 2022 02:07:01 -0700


Myasuka commented on code in PR #531:
URL: https://github.com/apache/flink-web/pull/531#discussion_r867326080



##########
_posts/2022-05-01-release-table-store-0.1.0.md:
##########
@@ -0,0 +1,110 @@
+---
+layout: post
+title:  "Apache Flink Table Store 0.1.0 Release Announcement"
+subtitle: "Unified streaming and batch store for building dynamic tables on 
Apache Flink."
+date: 2022-05-01T08:00:00.000Z
+categories: news
+authors:
+- Jingsong Lee:
+  name: "Jingsong Lee"
+
+---
+
+The Apache Flink community is pleased to announce the preview release of the
+[Apache Flink Table Store](https://github.com/apache/flink-table-store) 
(0.1.0).
+
+Flink Table Store is a unified streaming and batch store for building dynamic 
tables
+on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for 
high speed
+and a large amount of data update & query capability.
+
+Please check out the full 
[documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for 
detailed information and user guides.
+
+Note: Flink Table Store is still in beta status and undergoing rapid 
development,
+we do not recommend that you use it directly in a production environment.
+
+## What is Flink Table Store
+
+Open [Flink official website](https://flink.apache.org/), you can see the 
following line:
+`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on 
distributed computing,
+which brings real-time big data computing. Users need to combine Flink with 
some kind of external storage.
+
+The message queue will be used in both source & intermediate stages in 
streaming pipeline, to guarantee the
+latency stay within seconds. There will also be a real-time OLAP system 
receiving processed data in streaming
+fashion and serving user’s ad-hoc queries.
+
+Everything works fine as long as users only care about the aggregated results. 
But when users start to care
+about the intermediate data, they will immediately hit a blocker: Intermediate 
kafka tables are not queryable.
+
+Therefore, users use multiple systems. Writing to a lake store like Apache 
Hudi, Apache Iceberg while writing to Queue,

Review Comment:
   ```suggestion
   Therefore, users use multiple systems. Writing to a lake store like Apache 
Hudi or Apache Iceberg while writing to Queue,
   ```



##########
_posts/2022-05-01-release-table-store-0.1.0.md:
##########
@@ -0,0 +1,110 @@
+---
+layout: post
+title:  "Apache Flink Table Store 0.1.0 Release Announcement"
+subtitle: "Unified streaming and batch store for building dynamic tables on 
Apache Flink."
+date: 2022-05-01T08:00:00.000Z
+categories: news
+authors:
+- Jingsong Lee:
+  name: "Jingsong Lee"
+
+---
+
+The Apache Flink community is pleased to announce the preview release of the
+[Apache Flink Table Store](https://github.com/apache/flink-table-store) 
(0.1.0).
+
+Flink Table Store is a unified streaming and batch store for building dynamic 
tables
+on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for 
high speed
+and a large amount of data update & query capability.
+
+Please check out the full 
[documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for 
detailed information and user guides.
+
+Note: Flink Table Store is still in beta status and undergoing rapid 
development,
+we do not recommend that you use it directly in a production environment.
+
+## What is Flink Table Store
+
+Open [Flink official website](https://flink.apache.org/), you can see the 
following line:
+`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on 
distributed computing,
+which brings real-time big data computing. Users need to combine Flink with 
some kind of external storage.
+
+The message queue will be used in both source & intermediate stages in 
streaming pipeline, to guarantee the
+latency stay within seconds. There will also be a real-time OLAP system 
receiving processed data in streaming
+fashion and serving user’s ad-hoc queries.
+
+Everything works fine as long as users only care about the aggregated results. 
But when users start to care
+about the intermediate data, they will immediately hit a blocker: Intermediate 
kafka tables are not queryable.
+
+Therefore, users use multiple systems. Writing to a lake store like Apache 
Hudi, Apache Iceberg while writing to Queue,
+the lake store keeps historical data at a lower cost.
+
+There are two main issues with doing this:
+- High understanding bar for users: It’s also not easy for users to understand 
all the SQL connectors,
+  learn the capabilities and restrictions for each of those. Users may also 
want to play around with
+  streaming & batch unification, but don't really know how, given the 
connectors are most of the time different
+  in batch and streaming use cases.
+- Increasing architecture complexity: It’s hard to choose the most suited 
external systems when the requirements
+  include streaming pipelines, offline batch jobs, ad-hoc queries. Multiple 
systems will increase the operation
+  and maintenance complexity. Users at least need to coordinate between the 
queue system and file system of each
+  table, which is error-prone.
+
+The Flink Table Store aims to provide a unified storage abstraction:
+- Table Store provides storage of historical data while providing queue 
abstraction.
+- Table Store provides competitive historical storage with lake storage 
capability, using LSM file structure
+  to store data on DFS, providing real-time updates and queries at a lower 
cost.
+- Table Store coordinates between the queue storage and historical storage, 
providing hybrid read and write capabilities.
+- Table Store is a storage created for Flink, which satisfies all the concepts 
of Flink SQL and is the most
+  suitable storage abstraction for Flink.
+
+## Core Features
+
+Flink Table Store supports the following usage:
+- **Streaming Insert**: Write changelog streams, including CDC from the 
database and streams.
+- **Batch Insert**: Write batch data as offline warehouse, including OVERWRITE 
support.
+- **Batch/OLAP Query**: Read the snapshot of the storage, efficient querying 
of real-time data.
+- **Streaming Query**: Read the storage changes, ensure exactly-once 
consistency.
+
+Flink Table Store uses the following technologies to support the above user 
usages:
+- Hybrid Storage: Integrating Apache Kafka to achieve real-time stream 
computation.
+- LSM Structure: For a large amount of data updates and high performance 
queries.
+- Columnar File Format: Use Apache ORC to support efficient querying.
+- Lake Storage: Metadata and data on DFS and Object Store.

Review Comment:
   I think adding some details is okay here as this is the 1st version and 
users would still be curious about the implementations considering not so many 
guys involved in development.
   However, I think the description looks a bit strange. The details of `Hybrid 
Storage`, `Columnar File Format` and `Lake Storage` are used to explain how we 
implement it. While the details of `LSM Structure` is used to explain why we 
need this. Maybe we can use `key-value store` to replace `LSM Structure` and 
then explain how we implement this.



##########
downloads.md:
##########
@@ -162,6 +162,27 @@ This version is compatible with Apache Flink version {{ 
flink_kubernetes_operato
 
 {% endfor %}
 
+Apache Flink® Table Store {{ site.FLINK_TABLE_STORE_VERSION_STABLE }} is the 
latest stable release for the [Flink Table 
Store](https://github.com/apache/flink-table-store).
+
+{% for flink_table_store_release in site.flink_table_store_releases %}
+
+## {{ flink_table_store_release.source_release.name }}
+
+<p>
+<a href="{{ flink_table_store_release.source_release.url }}" id="{{ 
flink_table_store_release.source_release.id }}">{{ 
flink_table_store_release.source_release.name }} Source Release</a>
+(<a href="{{ flink_table_store_release.source_release.asc_url }}">asc</a>, <a 
href="{{ flink_table_store_release.source_release.sha512_url }}">sha512</a>)
+</p>
+<p>
+<a href="{{ flink_table_store_release.binaries_release.url }}" id="{{ 
flink_table_store_release.binaries_release.id }}">{{ 
flink_table_store_release.binaries_release.name }} Binaries Release</a>
+(<a href="{{ flink_table_store_release.binaries_release.asc_url }}">asc</a>, 
<a href="{{ flink_table_store_release.binaries_release.sha1_url }}">sha1</a>)
+</p>
+
+This version is compatible with Apache Flink version {{ 
flink_table_store_release.source_release.flink_version }}.

Review Comment:
   ```suggestion
   This version is compatible with Apache Flink version(s): {{ 
flink_table_store_release.source_release.flink_version }}.
   ```



##########
_posts/2022-05-01-release-table-store-0.1.0.md:
##########
@@ -0,0 +1,110 @@
+---
+layout: post
+title:  "Apache Flink Table Store 0.1.0 Release Announcement"
+subtitle: "Unified streaming and batch store for building dynamic tables on 
Apache Flink."
+date: 2022-05-01T08:00:00.000Z
+categories: news
+authors:
+- Jingsong Lee:
+  name: "Jingsong Lee"
+
+---
+
+The Apache Flink community is pleased to announce the preview release of the
+[Apache Flink Table Store](https://github.com/apache/flink-table-store) 
(0.1.0).
+
+Flink Table Store is a unified streaming and batch store for building dynamic 
tables
+on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for 
high speed
+and a large amount of data update & query capability.
+
+Please check out the full 
[documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for 
detailed information and user guides.
+
+Note: Flink Table Store is still in beta status and undergoing rapid 
development,
+we do not recommend that you use it directly in a production environment.
+
+## What is Flink Table Store
+
+Open [Flink official website](https://flink.apache.org/), you can see the 
following line:
+`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on 
distributed computing,
+which brings real-time big data computing. Users need to combine Flink with 
some kind of external storage.
+
+The message queue will be used in both source & intermediate stages in 
streaming pipeline, to guarantee the
+latency stay within seconds. There will also be a real-time OLAP system 
receiving processed data in streaming
+fashion and serving user’s ad-hoc queries.
+
+Everything works fine as long as users only care about the aggregated results. 
But when users start to care
+about the intermediate data, they will immediately hit a blocker: Intermediate 
kafka tables are not queryable.
+
+Therefore, users use multiple systems. Writing to a lake store like Apache 
Hudi, Apache Iceberg while writing to Queue,
+the lake store keeps historical data at a lower cost.
+
+There are two main issues with doing this:
+- High understanding bar for users: It’s also not easy for users to understand 
all the SQL connectors,
+  learn the capabilities and restrictions for each of those. Users may also 
want to play around with
+  streaming & batch unification, but don't really know how, given the 
connectors are most of the time different
+  in batch and streaming use cases.
+- Increasing architecture complexity: It’s hard to choose the most suited 
external systems when the requirements
+  include streaming pipelines, offline batch jobs, ad-hoc queries. Multiple 
systems will increase the operation
+  and maintenance complexity. Users at least need to coordinate between the 
queue system and file system of each
+  table, which is error-prone.
+
+The Flink Table Store aims to provide a unified storage abstraction:
+- Table Store provides storage of historical data while providing queue 
abstraction.
+- Table Store provides competitive historical storage with lake storage 
capability, using LSM file structure
+  to store data on DFS, providing real-time updates and queries at a lower 
cost.
+- Table Store coordinates between the queue storage and historical storage, 
providing hybrid read and write capabilities.
+- Table Store is a storage created for Flink, which satisfies all the concepts 
of Flink SQL and is the most
+  suitable storage abstraction for Flink.
+
+## Core Features
+
+Flink Table Store supports the following usage:
+- **Streaming Insert**: Write changelog streams, including CDC from the 
database and streams.
+- **Batch Insert**: Write batch data as offline warehouse, including OVERWRITE 
support.
+- **Batch/OLAP Query**: Read the snapshot of the storage, efficient querying 
of real-time data.
+- **Streaming Query**: Read the storage changes, ensure exactly-once 
consistency.

Review Comment:
   If we follow the order of `streaming insert` and then `batch insert`, I 
think we should better to give the order of `streaming query` and then `batch 
query`.
   BTW, do we need to split the features of reading snapshot and real-time 
data? From my side, they are actually two features.



##########
_posts/2022-05-01-release-table-store-0.1.0.md:
##########
@@ -0,0 +1,110 @@
+---
+layout: post
+title:  "Apache Flink Table Store 0.1.0 Release Announcement"
+subtitle: "Unified streaming and batch store for building dynamic tables on 
Apache Flink."
+date: 2022-05-01T08:00:00.000Z
+categories: news
+authors:
+- Jingsong Lee:
+  name: "Jingsong Lee"
+
+---
+
+The Apache Flink community is pleased to announce the preview release of the
+[Apache Flink Table Store](https://github.com/apache/flink-table-store) 
(0.1.0).
+
+Flink Table Store is a unified streaming and batch store for building dynamic 
tables
+on Apache Flink. It uses a full Log-Structured Merge-Tree (LSM) structure for 
high speed
+and a large amount of data update & query capability.
+
+Please check out the full 
[documentation]({{site.DOCS_BASE_URL}}flink-table-store-docs-release-0.1/) for 
detailed information and user guides.
+
+Note: Flink Table Store is still in beta status and undergoing rapid 
development,
+we do not recommend that you use it directly in a production environment.
+
+## What is Flink Table Store
+
+Open [Flink official website](https://flink.apache.org/), you can see the 
following line:
+`Apache Flink - Stateful Computations over Data Streams.` Flink focuses on 
distributed computing,
+which brings real-time big data computing. Users need to combine Flink with 
some kind of external storage.
+
+The message queue will be used in both source & intermediate stages in 
streaming pipeline, to guarantee the
+latency stay within seconds. There will also be a real-time OLAP system 
receiving processed data in streaming
+fashion and serving user’s ad-hoc queries.
+
+Everything works fine as long as users only care about the aggregated results. 
But when users start to care
+about the intermediate data, they will immediately hit a blocker: Intermediate 
kafka tables are not queryable.

Review Comment:
   The doc uses `message queue` instead of `Kafka` in the previous part, and 
we'd better to keep it still as a `message-queue-based tables`. Otherwise, it 
might make users a bit confused.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-web] Myasuka commented on a diff in pull request #531: Add Table Store 0.1.0 release

Reply via email to