neuyilan commented on PR #4423:
URL: https://github.com/apache/paimon/pull/4423#issuecomment-2453983352

   > Hi @neuyilan , I took a rough look and found that there are many thread 
safety risks in certain areas. Can we go back to this requirement? Is it 
necessary for us to do multi-threaded access? Is this effective? Why not 
increase Flink's parallelism?
   
   Yes, there are too many thread safety issues in the current design. So 
essentially, access is synchronous. Even if '`lookup. sync`' is set to true in 
flink connector, it does not have any acceleration effect.
   
   I think supporting asynchronous multi-threaded access is necessary. And in 
our testing, it was effective. Asynchronous multi-threaded access has the 
following benefits compared to increasing concurrency in Flink:
   
   1. Reduce the usage of memory resources. Each Flink subtask will occupy 
additional memory; The more sub tasks there are, the more memory will be 
occupied. Assuming one 4G TM. In our scenario, if asynchronous multi-threaded 
access is enabled, performance can be improved by approximately 7 times. This 
requires 7 TMs to achieve, but these 7 TMs will occupy an additional 4 * 7G of 
memory. This is even more useful in elastic resources. In situations where 
memory resources are a bottleneck, my job has a hard limit on memory, allowing 
only a maximum of 4G memory to be used (managed by Yarn), but the CPU can be 
exceeded (cgroup soft limit). So this feature can be used to enable 
asynchronous multi-threaded access, accelerating performance without adding 
additional resources. **No additional costs and increase efficiency.**
   
   2. Reduce cache disk usage, as currently cached data is exclusive to each 
task. If multi-threaded access can be utilized within a task, it can reduce the 
cache disk usage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to