Denis Mekhanikov, Currently metadata are fsync'ed on write. This might be the case of slow-downs in case of metadata burst writes. I think removing fsync could help to mitigate performance issues with current implementation until proper solution will be implemented: moving metadata to metastore.
вт, 13 авг. 2019 г. в 17:09, Denis Mekhanikov <dmekhani...@gmail.com>: > I would also like to mention, that marshaller mappings are written to disk > even if persistence is disabled. > So, this issue affects purely in-memory clusters as well. > > Denis > > > On 13 Aug 2019, at 17:06, Denis Mekhanikov <dmekhani...@gmail.com> > wrote: > > > > Hi! > > > > When persistence is enabled, binary metadata is written to disk upon > registration. Currently it happens in the discovery thread, which makes > processing of related messages very slow. > > There are cases, when a lot of nodes and slow disks can make every > binary type be registered for several minutes. Plus it blocks processing of > other messages. > > > > I propose starting a separate thread that will be responsible for > writing binary metadata to disk. So, binary type registration will be > considered finished before information about it will is written to disks on > all nodes. > > > > The main concern here is data consistency in cases when a node > acknowledges type registration and then fails before writing the metadata > to disk. > > I see two parts of this issue: > > Nodes will have different metadata after restarting. > > If we write some data into a persisted cache and shut down nodes faster > than a new binary type is written to disk, then after a restart we won’t > have a binary type to work with. > > > > The first case is similar to a situation, when one node fails, and after > that a new type is registered in the cluster. This issue is resolved by the > discovery data exchange. All nodes receive information about all binary > types in the initial discovery messages sent by other nodes. So, once you > restart a node, it will receive information, that it failed to finish > writing to disk, from other nodes. > > If all nodes shut down before finishing writing the metadata to disk, > then after a restart the type will be considered unregistered, so another > registration will be required. > > > > The second case is a bit more complicated. But it can be resolved by > making the discovery threads on every node create a future, that will be > completed when writing to disk is finished. So, every node will have such > future, that will reflect the current state of persisting the metadata to > disk. > > After that, if some operation needs this binary type, it will need to > wait on that future until flushing to disk is finished. > > This way discovery threads won’t be blocked, but other threads, that > actually need this type, will be. > > > > Please let me know what you think about that. > > > > Denis > > -- Best regards, Alexei Scherbakov