[GitHub] [incubator-streampark] lysgithub0302 opened a new issue, #2941: [Proposal] Easy widetable

via GitHub Mon, 14 Aug 2023 03:50:34 -0700


lysgithub0302 opened a new issue, #2941:
URL: https://github.com/apache/incubator-streampark/issues/2941

### Search before asking

- [X] I had searched in the
[feature](https://github.com/apache/incubator-streampark/issues?q=is%3Aissue+label%3A%22Feature%22)
and found no similar feature requirement.

### Description

English as follows：[Chinese
link](https://eminent-clarinet-440.notion.site/streampark-Proposal-easy-widetable-7754b49cbf394153851a082c058535fa?pvs=4)
Currently more than 90% of the requirements in the scenarios developed using
streampark are table widening requirements. There are many disadvantages of
using flink sql join to complete the demand development. As follows:
a. regular join core problem is that the state will continue to increase,
occupy a lot of memory, fault recovery time is long.
b. interval join core problems need to have a clear expiration time on the
industry, otherwise it becomes a regular join.
c.temporal join Core problem is that only a flow table can monitor the
changes.

This leads to the task of doing flink table widening requires mature flink
developers to complete the size of the business data and change latency and so
on have a sufficient understanding to complete the development task. flink sql
development and offline business development widening requirements compared to
the development of a large difference in development efficiency, stability is
also relatively poor.

streampark can do a development cost close to the efficiency of offline
development of a widening development module. Here we first called widetable
module. To the flink developers and even products, analysts, number of
warehouses, ordinary developers to provide a more efficient and stable
development method to play the development of real-time widetable requirements.

Architecture diagram of the widetable implementation:
<img width="721"
alt="WeChatWorkScreenshot_d15ba514-9f28-4dde-acda-a2a85605a20b"
src="https://github.com/apache/incubator-streampark/assets/91652711/43803e7e-2efd-4db7-93e4-719426172e31";>
Interaction Layer:

Enabling widetable to hit wide can be done in two ways:
a.The product can be drag and drop to complete the development of the
widetable.
b.Number warehouse, analysts, different business development can write a
hive sql can complete the development of wide table.

Parsing layer:
Merit by parsing user input to generate flink tasks to complete the flink

Deployment layer:
Use streampark deployment module to complete the deployment of flink tasks ,
deployed to k8s or yarn.

Middleware and storage layer:
This part will be reflected in the real-time widening logic diagram later.

Real-time widening logic diagram：
<img width="1046"
alt="WeChatWorkScreenshot_5be5a401-13eb-46d4-a876-a9a1c2e5de2d"
src="https://github.com/apache/incubator-streampark/assets/91652711/3518a9eb-de67-4705-b24b-b8d3fda30bb3";>
Design Core Idea:

Extend temporal join, so that temporal join can monitor all the changes in
the dimension table to hit the width. The benefit is that the data state is
external, the development no longer need to care about the state is too large
problem, the sequence of data and other issues. And the number of changes to
hit the width of the delay is within 1 second, no longer need to worry about
business time requirements can not meet the problem.

The demo explainer video is below:

link: https://pan.baidu.com/s/1Wv3t5zDiXIio89dA97_3tg extraction code: r5w6

### Usage Scenario

_No response_

### Related issues

_No response_

### Are you willing to submit a PR?

- [X] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of
Conduct](https://www.apache.org/foundation/policies/conduct)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-streampark] lysgithub0302 opened a new issue, #2941: [Proposal] Easy widetable

Reply via email to