Hi! When persistence is enabled, binary metadata is written to disk upon registration. Currently it happens in the discovery thread, which makes processing of related messages very slow. There are cases, when a lot of nodes and slow disks can make every binary type be registered for several minutes. Plus it blocks processing of other messages.
I propose starting a separate thread that will be responsible for writing binary metadata to disk. So, binary type registration will be considered finished before information about it will is written to disks on all nodes. The main concern here is data consistency in cases when a node acknowledges type registration and then fails before writing the metadata to disk. I see two parts of this issue: Nodes will have different metadata after restarting. If we write some data into a persisted cache and shut down nodes faster than a new binary type is written to disk, then after a restart we won’t have a binary type to work with. The first case is similar to a situation, when one node fails, and after that a new type is registered in the cluster. This issue is resolved by the discovery data exchange. All nodes receive information about all binary types in the initial discovery messages sent by other nodes. So, once you restart a node, it will receive information, that it failed to finish writing to disk, from other nodes. If all nodes shut down before finishing writing the metadata to disk, then after a restart the type will be considered unregistered, so another registration will be required. The second case is a bit more complicated. But it can be resolved by making the discovery threads on every node create a future, that will be completed when writing to disk is finished. So, every node will have such future, that will reflect the current state of persisting the metadata to disk. After that, if some operation needs this binary type, it will need to wait on that future until flushing to disk is finished. This way discovery threads won’t be blocked, but other threads, that actually need this type, will be. Please let me know what you think about that. Denis
