neuyilan commented on PR #4423: URL: https://github.com/apache/paimon/pull/4423#issuecomment-2453983352
> Hi @neuyilan , I took a rough look and found that there are many thread safety risks in certain areas. Can we go back to this requirement? Is it necessary for us to do multi-threaded access? Is this effective? Why not increase Flink's parallelism? Yes, there are too many thread safety issues in the current design. So essentially, access is synchronous. Even if '`lookup. sync`' is set to true in flink connector, it does not have any acceleration effect. I think supporting asynchronous multi-threaded access is necessary. And in our testing, it was effective. Asynchronous multi-threaded access has the following benefits compared to increasing concurrency in Flink: 1. Reduce the usage of memory resources. Each Flink subtask will occupy additional memory; The more sub tasks there are, the more memory will be occupied. Assuming one 4G TM. In our scenario, if asynchronous multi-threaded access is enabled, performance can be improved by approximately 7 times. This requires 7 TMs to achieve, but these 7 TMs will occupy an additional 4 * 7G of memory. This is even more useful in elastic resources. In situations where memory resources are a bottleneck, my job has a hard limit on memory, allowing only a maximum of 4G memory to be used (managed by Yarn), but the CPU can be exceeded (cgroup soft limit). So this feature can be used to enable asynchronous multi-threaded access, accelerating performance without adding additional resources. **No additional costs and increase efficiency.** 2. Reduce cache disk usage, as currently cached data is exclusive to each task. If multi-threaded access can be utilized within a task, it can reduce the cache disk usage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
