lysgithub0302 opened a new issue, #2941: URL: https://github.com/apache/incubator-streampark/issues/2941
### Search before asking - [X] I had searched in the [feature](https://github.com/apache/incubator-streampark/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description English as follows:[Chinese link](https://eminent-clarinet-440.notion.site/streampark-Proposal-easy-widetable-7754b49cbf394153851a082c058535fa?pvs=4) Currently more than 90% of the requirements in the scenarios developed using streampark are table widening requirements. There are many disadvantages of using flink sql join to complete the demand development. As follows: a. regular join core problem is that the state will continue to increase, occupy a lot of memory, fault recovery time is long. b. interval join core problems need to have a clear expiration time on the industry, otherwise it becomes a regular join. c.temporal join Core problem is that only a flow table can monitor the changes. This leads to the task of doing flink table widening requires mature flink developers to complete the size of the business data and change latency and so on have a sufficient understanding to complete the development task. flink sql development and offline business development widening requirements compared to the development of a large difference in development efficiency, stability is also relatively poor. streampark can do a development cost close to the efficiency of offline development of a widening development module. Here we first called widetable module. To the flink developers and even products, analysts, number of warehouses, ordinary developers to provide a more efficient and stable development method to play the development of real-time widetable requirements. Architecture diagram of the widetable implementation: <img width="721" alt="WeChatWorkScreenshot_d15ba514-9f28-4dde-acda-a2a85605a20b" src="https://github.com/apache/incubator-streampark/assets/91652711/43803e7e-2efd-4db7-93e4-719426172e31"> Interaction Layer: Enabling widetable to hit wide can be done in two ways: a.The product can be drag and drop to complete the development of the widetable. b.Number warehouse, analysts, different business development can write a hive sql can complete the development of wide table. Parsing layer: Merit by parsing user input to generate flink tasks to complete the flink Deployment layer: Use streampark deployment module to complete the deployment of flink tasks , deployed to k8s or yarn. Middleware and storage layer: This part will be reflected in the real-time widening logic diagram later. Real-time widening logic diagram: <img width="1046" alt="WeChatWorkScreenshot_5be5a401-13eb-46d4-a876-a9a1c2e5de2d" src="https://github.com/apache/incubator-streampark/assets/91652711/3518a9eb-de67-4705-b24b-b8d3fda30bb3"> Design Core Idea: Extend temporal join, so that temporal join can monitor all the changes in the dimension table to hit the width. The benefit is that the data state is external, the development no longer need to care about the state is too large problem, the sequence of data and other issues. And the number of changes to hit the width of the delay is within 1 second, no longer need to worry about business time requirements can not meet the problem. The demo explainer video is below: link: https://pan.baidu.com/s/1Wv3t5zDiXIio89dA97_3tg extraction code: r5w6 ### Usage Scenario _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
