yahoNanJing opened a new issue #1703:
URL: https://github.com/apache/arrow-datafusion/issues/1703
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
<!-- A clear and concise description of what the problem is. Ex. I'm always
frustrated when [...]
(This section helps Arrow developers understand the context and *why* for
this feature, in addition to the *what*) -->
Currently all of the cluster state, like executor info, task info, are
stored in the sled db. And a global lock is used for dealing with concurrency
issue. Not only the serialization and deserialization cost will be large, but
also the global lock will be a bottleneck when hundreds of thousands of tasks
need to be dealt with.
**Describe the solution you'd like**
<!-- A clear and concise description of what you want to happen. -->
A better way is:
1. Firstly classify the cluster state
- which cluster state will be relatively stable, like executor metadata,
execution plan for jobs,
- which cluster state will be changed frequently, like executor available
task slots, task status
2. Secondly for different kinds of cluster state info, use corresponding
suitable way to deal with them
- for stable info, we may still store them in the sled db as a ground
truth. However, better to cache them in memory to reduce the serialization and
deserialization cost.
- for volatile cluster state info, better not to store them in the db.
Just keep one in memory. In case of using multiple schedulers, it's better to
use other ways to deal with the state sync issue, like optimistic lock with
compare and set, etc.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]