> >> 1. Yes, only on OS failures. In such case data will be received from alive >> nodes later. What behavior would be in case of one node ? I suppose someone can obtain cache data without unmarshalling schema, what in this case would be with grid operability?
> >> 2. Yes, for walmode=FSYNC writes to metastore will be slow. But such mode >> should not be used if you have more than two nodes in grid because it has >> huge impact on performance. Is wal mode affects metadata store ? > >> >> ср, 14 авг. 2019 г. в 14:29, Denis Mekhanikov < dmekhani...@gmail.com >: >> >>> Folks, >>> >>> Thanks for showing interest in this issue! >>> >>> Alexey, >>> >>>> I think removing fsync could help to mitigate performance issues with >>> current implementation >>> >>> Is my understanding correct, that if we remove fsync, then discovery won’t >>> be blocked, and data will be flushed to disk in background, and loss of >>> information will be possible only on OS failure? It sounds like an >>> acceptable workaround to me. >>> >>> Will moving metadata to metastore actually resolve this issue? Please >>> correct me if I’m wrong, but we will still need to write the information to >>> WAL before releasing the discovery thread. If WAL mode is FSYNC, then the >>> issue will still be there. Or is it planned to abandon the discovery-based >>> protocol at all? >>> >>> Evgeniy, Ivan, >>> >>> In my particular case the data wasn’t too big. It was a slow virtualised >>> disk with encryption, that made operations slow. Given that there are 200 >>> nodes in a cluster, where every node writes slowly, and this process is >>> sequential, one piece of metadata is registered extremely slowly. >>> >>> Ivan, answering to your other questions: >>> >>>> 2. Do we need a persistent metadata for in-memory caches? Or is it so >>> accidentally? >>> >>> It should be checked, if it’s safe to stop writing marshaller mappings to >>> disk without loosing any guarantees. >>> But anyway, I would like to have a property, that would control this. If >>> metadata registration is slow, then initial cluster warmup may take a >>> while. So, if we preserve metadata on disk, then we will need to warm it up >>> only once, and further restarts won’t be affected. >>> >>>> Do we really need a fast fix here? >>> >>> I would like a fix, that could be implemented now, since the activity with >>> moving metadata to metastore doesn’t sound like a quick one. Having a >>> temporary solution would be nice. >>> >>> Denis >>> >>>> On 14 Aug 2019, at 11:53, Павлухин Иван < vololo...@gmail.com > wrote: >>>> >>>> Denis, >>>> >>>> Several clarifying questions: >>>> 1. Do you have an idea why metadata registration takes so long? So >>>> poor disks? So many data to write? A contention with disk writes by >>>> other subsystems? >>>> 2. Do we need a persistent metadata for in-memory caches? Or is it so >>>> accidentally? >>>> >>>> Generally, I think that it is possible to move metadata saving >>>> operations out of discovery thread without loosing required >>>> consistency/integrity. >>>> >>>> As Alex mentioned using metastore looks like a better solution. Do we >>>> really need a fast fix here? (Are we talking about fast fix?) >>>> >>>> ср, 14 авг. 2019 г. в 11:45, Zhenya Stanilovsky >>> < arzamas...@mail.ru.invalid >: >>>>> >>>>> Alexey, but in this case customer need to be informed, that whole (for >>> example 1 node) cluster crash (power off) could lead to partial data >>> unavailability. >>>>> And may be further index corruption. >>>>> 1. Why your meta takes a substantial size? may be context leaking ? >>>>> 2. Could meta be compressed ? >>>>> >>>>> >>>>>> Среда, 14 августа 2019, 11:22 +03:00 от Alexei Scherbakov < >>> alexey.scherbak...@gmail.com >: >>>>>> >>>>>> Denis Mekhanikov, >>>>>> >>>>>> Currently metadata are fsync'ed on write. This might be the case of >>>>>> slow-downs in case of metadata burst writes. >>>>>> I think removing fsync could help to mitigate performance issues with >>>>>> current implementation until proper solution will be implemented: >>> moving >>>>>> metadata to metastore. >>>>>> >>>>>> >>>>>> вт, 13 авг. 2019 г. в 17:09, Denis Mekhanikov < dmekhani...@gmail.com >>>> : >>>>>> >>>>>>> I would also like to mention, that marshaller mappings are written to >>> disk >>>>>>> even if persistence is disabled. >>>>>>> So, this issue affects purely in-memory clusters as well. >>>>>>> >>>>>>> Denis >>>>>>> >>>>>>>> On 13 Aug 2019, at 17:06, Denis Mekhanikov < dmekhani...@gmail.com > >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>> When persistence is enabled, binary metadata is written to disk upon >>>>>>> registration. Currently it happens in the discovery thread, which >>> makes >>>>>>> processing of related messages very slow. >>>>>>>> There are cases, when a lot of nodes and slow disks can make every >>>>>>> binary type be registered for several minutes. Plus it blocks >>> processing of >>>>>>> other messages. >>>>>>>> >>>>>>>> I propose starting a separate thread that will be responsible for >>>>>>> writing binary metadata to disk. So, binary type registration will be >>>>>>> considered finished before information about it will is written to >>> disks on >>>>>>> all nodes. >>>>>>>> >>>>>>>> The main concern here is data consistency in cases when a node >>>>>>> acknowledges type registration and then fails before writing the >>> metadata >>>>>>> to disk. >>>>>>>> I see two parts of this issue: >>>>>>>> Nodes will have different metadata after restarting. >>>>>>>> If we write some data into a persisted cache and shut down nodes >>> faster >>>>>>> than a new binary type is written to disk, then after a restart we >>> won’t >>>>>>> have a binary type to work with. >>>>>>>> >>>>>>>> The first case is similar to a situation, when one node fails, and >>> after >>>>>>> that a new type is registered in the cluster. This issue is resolved >>> by the >>>>>>> discovery data exchange. All nodes receive information about all >>> binary >>>>>>> types in the initial discovery messages sent by other nodes. So, once >>> you >>>>>>> restart a node, it will receive information, that it failed to finish >>>>>>> writing to disk, from other nodes. >>>>>>>> If all nodes shut down before finishing writing the metadata to disk, >>>>>>> then after a restart the type will be considered unregistered, so >>> another >>>>>>> registration will be required. >>>>>>>> >>>>>>>> The second case is a bit more complicated. But it can be resolved by >>>>>>> making the discovery threads on every node create a future, that will >>> be >>>>>>> completed when writing to disk is finished. So, every node will have >>> such >>>>>>> future, that will reflect the current state of persisting the >>> metadata to >>>>>>> disk. >>>>>>>> After that, if some operation needs this binary type, it will need to >>>>>>> wait on that future until flushing to disk is finished. >>>>>>>> This way discovery threads won’t be blocked, but other threads, that >>>>>>> actually need this type, will be. >>>>>>>> >>>>>>>> Please let me know what you think about that. >>>>>>>> >>>>>>>> Denis >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Best regards, >>>>>> Alexei Scherbakov >>>>> >>>>> >>>>> -- >>>>> Zhenya Stanilovsky >>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Ivan Pavlukhin >>> >>> >> >> -- >> >> Best regards, >> Alexei Scherbakov > -- Zhenya Stanilovsky