Re[2]: Asynchronous registration of binary metadata

Zhenya Stanilovsky Wed, 14 Aug 2019 23:29:07 -0700

>
>> 1. Yes, only on OS failures. In such case data will be received from alive
>> nodes later.
What behavior would be in case of one node ? I suppose someone can obtain cache 
data without unmarshalling schema, what in this case would be with grid 
operability?


>
>> 2. Yes, for walmode=FSYNC writes to metastore will be slow. But such mode
>> should not be used if you have more than two nodes in grid because it has
>> huge impact on performance.
Is wal mode affects metadata store ?

>
>> 
>> ср, 14 авг. 2019 г. в 14:29, Denis Mekhanikov < dmekhani...@gmail.com >:
>> 
>>> Folks,
>>> 
>>> Thanks for showing interest in this issue!
>>> 
>>> Alexey,
>>> 
>>>> I think removing fsync could help to mitigate performance issues with
>>> current implementation
>>> 
>>> Is my understanding correct, that if we remove fsync, then discovery won’t
>>> be blocked, and data will be flushed to disk in background, and loss of
>>> information will be possible only on OS failure? It sounds like an
>>> acceptable workaround to me.
>>> 
>>> Will moving metadata to metastore actually resolve this issue? Please
>>> correct me if I’m wrong, but we will still need to write the information to
>>> WAL before releasing the discovery thread. If WAL mode is FSYNC, then the
>>> issue will still be there. Or is it planned to abandon the discovery-based
>>> protocol at all?
>>> 
>>> Evgeniy, Ivan,
>>> 
>>> In my particular case the data wasn’t too big. It was a slow virtualised
>>> disk with encryption, that made operations slow. Given that there are 200
>>> nodes in a cluster, where every node writes slowly, and this process is
>>> sequential, one piece of metadata is registered extremely slowly.
>>> 
>>> Ivan, answering to your other questions:
>>> 
>>>> 2. Do we need a persistent metadata for in-memory caches? Or is it so
>>> accidentally?
>>> 
>>> It should be checked, if it’s safe to stop writing marshaller mappings to
>>> disk without loosing any guarantees.
>>> But anyway, I would like to have a property, that would control this. If
>>> metadata registration is slow, then initial cluster warmup may take a
>>> while. So, if we preserve metadata on disk, then we will need to warm it up
>>> only once, and further restarts won’t be affected.
>>> 
>>>> Do we really need a fast fix here?
>>> 
>>> I would like a fix, that could be implemented now, since the activity with
>>> moving metadata to metastore doesn’t sound like a quick one. Having a
>>> temporary solution would be nice.
>>> 
>>> Denis
>>> 
>>>> On 14 Aug 2019, at 11:53, Павлухин Иван < vololo...@gmail.com > wrote:
>>>> 
>>>> Denis,
>>>> 
>>>> Several clarifying questions:
>>>> 1. Do you have an idea why metadata registration takes so long? So
>>>> poor disks? So many data to write? A contention with disk writes by
>>>> other subsystems?
>>>> 2. Do we need a persistent metadata for in-memory caches? Or is it so
>>>> accidentally?
>>>> 
>>>> Generally, I think that it is possible to move metadata saving
>>>> operations out of discovery thread without loosing required
>>>> consistency/integrity.
>>>> 
>>>> As Alex mentioned using metastore looks like a better solution. Do we
>>>> really need a fast fix here? (Are we talking about fast fix?)
>>>> 
>>>> ср, 14 авг. 2019 г. в 11:45, Zhenya Stanilovsky
>>> < arzamas...@mail.ru.invalid >:
>>>>> 
>>>>> Alexey, but in this case customer need to be informed, that whole (for
>>> example 1 node) cluster crash (power off) could lead to partial data
>>> unavailability.
>>>>> And may be further index corruption.
>>>>> 1. Why your meta takes a substantial size? may be context leaking ?
>>>>> 2. Could meta be compressed ?
>>>>> 
>>>>> 
>>>>>> Среда, 14 августа 2019, 11:22 +03:00 от Alexei Scherbakov <
>>>  alexey.scherbak...@gmail.com >:
>>>>>> 
>>>>>> Denis Mekhanikov,
>>>>>> 
>>>>>> Currently metadata are fsync'ed on write. This might be the case of
>>>>>> slow-downs in case of metadata burst writes.
>>>>>> I think removing fsync could help to mitigate performance issues with
>>>>>> current implementation until proper solution will be implemented:
>>> moving
>>>>>> metadata to metastore.
>>>>>> 
>>>>>> 
>>>>>> вт, 13 авг. 2019 г. в 17:09, Denis Mekhanikov <  dmekhani...@gmail.com
>>>> :
>>>>>> 
>>>>>>> I would also like to mention, that marshaller mappings are written to
>>> disk
>>>>>>> even if persistence is disabled.
>>>>>>> So, this issue affects purely in-memory clusters as well.
>>>>>>> 
>>>>>>> Denis
>>>>>>> 
>>>>>>>> On 13 Aug 2019, at 17:06, Denis Mekhanikov <  dmekhani...@gmail.com >
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi!
>>>>>>>> 
>>>>>>>> When persistence is enabled, binary metadata is written to disk upon
>>>>>>> registration. Currently it happens in the discovery thread, which
>>> makes
>>>>>>> processing of related messages very slow.
>>>>>>>> There are cases, when a lot of nodes and slow disks can make every
>>>>>>> binary type be registered for several minutes. Plus it blocks
>>> processing of
>>>>>>> other messages.
>>>>>>>> 
>>>>>>>> I propose starting a separate thread that will be responsible for
>>>>>>> writing binary metadata to disk. So, binary type registration will be
>>>>>>> considered finished before information about it will is written to
>>> disks on
>>>>>>> all nodes.
>>>>>>>> 
>>>>>>>> The main concern here is data consistency in cases when a node
>>>>>>> acknowledges type registration and then fails before writing the
>>> metadata
>>>>>>> to disk.
>>>>>>>> I see two parts of this issue:
>>>>>>>> Nodes will have different metadata after restarting.
>>>>>>>> If we write some data into a persisted cache and shut down nodes
>>> faster
>>>>>>> than a new binary type is written to disk, then after a restart we
>>> won’t
>>>>>>> have a binary type to work with.
>>>>>>>> 
>>>>>>>> The first case is similar to a situation, when one node fails, and
>>> after
>>>>>>> that a new type is registered in the cluster. This issue is resolved
>>> by the
>>>>>>> discovery data exchange. All nodes receive information about all
>>> binary
>>>>>>> types in the initial discovery messages sent by other nodes. So, once
>>> you
>>>>>>> restart a node, it will receive information, that it failed to finish
>>>>>>> writing to disk, from other nodes.
>>>>>>>> If all nodes shut down before finishing writing the metadata to disk,
>>>>>>> then after a restart the type will be considered unregistered, so
>>> another
>>>>>>> registration will be required.
>>>>>>>> 
>>>>>>>> The second case is a bit more complicated. But it can be resolved by
>>>>>>> making the discovery threads on every node create a future, that will
>>> be
>>>>>>> completed when writing to disk is finished. So, every node will have
>>> such
>>>>>>> future, that will reflect the current state of persisting the
>>> metadata to
>>>>>>> disk.
>>>>>>>> After that, if some operation needs this binary type, it will need to
>>>>>>> wait on that future until flushing to disk is finished.
>>>>>>>> This way discovery threads won’t be blocked, but other threads, that
>>>>>>> actually need this type, will be.
>>>>>>>> 
>>>>>>>> Please let me know what you think about that.
>>>>>>>> 
>>>>>>>> Denis
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Best regards,
>>>>>> Alexei Scherbakov
>>>>> 
>>>>> 
>>>>> --
>>>>> Zhenya Stanilovsky
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> Ivan Pavlukhin
>>> 
>>> 
>> 
>> -- 
>> 
>> Best regards,
>> Alexei Scherbakov
>


-- 
Zhenya Stanilovsky

Re[2]: Asynchronous registration of binary metadata

Reply via email to