lysgithub0302 opened a new issue, #2941:
URL: https://github.com/apache/incubator-streampark/issues/2941

   ### Search before asking
   
   - [X] I had searched in the 
[feature](https://github.com/apache/incubator-streampark/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   
   English as follows:[Chinese 
link](https://eminent-clarinet-440.notion.site/streampark-Proposal-easy-widetable-7754b49cbf394153851a082c058535fa?pvs=4)
   Currently more than 90% of the requirements in the scenarios developed using 
streampark are table widening requirements. There are many disadvantages of 
using flink sql join to complete the demand development. As follows:
   a. regular join core problem is that the state will continue to increase, 
occupy a lot of memory, fault recovery time is long.
   b. interval join core problems need to have a clear expiration time on the 
industry, otherwise it becomes a regular join.
   c.temporal join Core problem is that only a flow table can monitor the 
changes.
   
   This leads to the task of doing flink table widening requires mature flink 
developers to complete the size of the business data and change latency and so 
on have a sufficient understanding to complete the development task. flink sql 
development and offline business development widening requirements compared to 
the development of a large difference in development efficiency, stability is 
also relatively poor.
   
   streampark can do a development cost close to the efficiency of offline 
development of a widening development module. Here we first called widetable 
module. To the flink developers and even products, analysts, number of 
warehouses, ordinary developers to provide a more efficient and stable 
development method to play the development of real-time widetable requirements.
   
   Architecture diagram of the widetable implementation:
   <img width="721" 
alt="WeChatWorkScreenshot_d15ba514-9f28-4dde-acda-a2a85605a20b" 
src="https://github.com/apache/incubator-streampark/assets/91652711/43803e7e-2efd-4db7-93e4-719426172e31";>
   Interaction Layer:
   
   Enabling widetable to hit wide can be done in two ways:
   a.The product can be drag and drop to complete the development of the 
widetable.
   b.Number warehouse, analysts, different business development can write a 
hive sql can complete the development of wide table.
   
   Parsing layer:
   Merit by parsing user input to generate flink tasks to complete the flink
   
   Deployment layer:
   Use streampark deployment module to complete the deployment of flink tasks , 
deployed to k8s or yarn.
   
   Middleware and storage layer:
   This part will be reflected in the real-time widening logic diagram later.
   
   Real-time widening logic diagram:
   <img width="1046" 
alt="WeChatWorkScreenshot_5be5a401-13eb-46d4-a876-a9a1c2e5de2d" 
src="https://github.com/apache/incubator-streampark/assets/91652711/3518a9eb-de67-4705-b24b-b8d3fda30bb3";>
   Design Core Idea:
   
   Extend temporal join, so that temporal join can monitor all the changes in 
the dimension table to hit the width. The benefit is that the data state is 
external, the development no longer need to care about the state is too large 
problem, the sequence of data and other issues. And the number of changes to 
hit the width of the delay is within 1 second, no longer need to worry about 
business time requirements can not meet the problem.
   
   The demo explainer video is below:
   
   link: https://pan.baidu.com/s/1Wv3t5zDiXIio89dA97_3tg extraction code: r5w6
   
   ### Usage Scenario
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to