yahoNanJing opened a new issue #1703:
URL: https://github.com/apache/arrow-datafusion/issues/1703


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   <!-- A clear and concise description of what the problem is. Ex. I'm always 
frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for 
this feature, in addition to  the *what*) -->
   
   Currently all of the cluster state, like executor info, task info, are 
stored in the sled db. And a global lock is used for dealing with concurrency 
issue. Not only the serialization and deserialization cost will be large, but 
also the global lock will be a bottleneck when hundreds of thousands of tasks 
need to be dealt with.
   
   **Describe the solution you'd like**
   <!-- A clear and concise description of what you want to happen. -->
   
   A better way is:
   1. Firstly classify the cluster state
      - which cluster state will be relatively stable, like executor metadata, 
execution plan for jobs, 
      - which cluster state will be changed frequently, like executor available 
task slots, task status
   2. Secondly for different kinds of cluster state info, use corresponding 
suitable way to deal with them
      - for stable info, we may still store them in the sled db as a ground 
truth. However, better to cache them in memory to reduce the serialization and 
deserialization cost.
      - for volatile cluster state info, better not to store them in the db. 
Just keep one in memory. In case of using multiple schedulers, it's better to 
use other ways to deal with the state sync issue, like optimistic lock with 
compare and set, etc.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to