Re: Asynchronous registration of binary metadata

Alexei Scherbakov Wed, 14 Aug 2019 01:22:49 -0700

Denis Mekhanikov,

Currently metadata are fsync'ed on write. This might be the case of
slow-downs in case of metadata burst writes.
I think removing fsync could help to mitigate performance issues with
current implementation until proper solution will be implemented: moving
metadata to metastore.



вт, 13 авг. 2019 г. в 17:09, Denis Mekhanikov <[email protected]>:

> I would also like to mention, that marshaller mappings are written to disk
> even if persistence is disabled.
> So, this issue affects purely in-memory clusters as well.
>
> Denis
>
> > On 13 Aug 2019, at 17:06, Denis Mekhanikov <[email protected]>
> wrote:
> >
> > Hi!
> >
> > When persistence is enabled, binary metadata is written to disk upon
> registration. Currently it happens in the discovery thread, which makes
> processing of related messages very slow.
> > There are cases, when a lot of nodes and slow disks can make every
> binary type be registered for several minutes. Plus it blocks processing of
> other messages.
> >
> > I propose starting a separate thread that will be responsible for
> writing binary metadata to disk. So, binary type registration will be
> considered finished before information about it will is written to disks on
> all nodes.
> >
> > The main concern here is data consistency in cases when a node
> acknowledges type registration and then fails before writing the metadata
> to disk.
> > I see two parts of this issue:
> > Nodes will have different metadata after restarting.
> > If we write some data into a persisted cache and shut down nodes faster
> than a new binary type is written to disk, then after a restart we won’t
> have a binary type to work with.
> >
> > The first case is similar to a situation, when one node fails, and after
> that a new type is registered in the cluster. This issue is resolved by the
> discovery data exchange. All nodes receive information about all binary
> types in the initial discovery messages sent by other nodes. So, once you
> restart a node, it will receive information, that it failed to finish
> writing to disk, from other nodes.
> > If all nodes shut down before finishing writing the metadata to disk,
> then after a restart the type will be considered unregistered, so another
> registration will be required.
> >
> > The second case is a bit more complicated. But it can be resolved by
> making the discovery threads on every node create a future, that will be
> completed when writing to disk is finished. So, every node will have such
> future, that will reflect the current state of persisting the metadata to
> disk.
> > After that, if some operation needs this binary type, it will need to
> wait on that future until flushing to disk is finished.
> > This way discovery threads won’t be blocked, but other threads, that
> actually need this type, will be.
> >
> > Please let me know what you think about that.
> >
> > Denis
>
>

-- 

Best regards,
Alexei Scherbakov

Re: Asynchronous registration of binary metadata

Reply via email to