Hello, A couple of days ago I encountered a strange phenomenon in our application based on Apache Ignite .Net 2.14 with persistence (3 nodes, 1 backup per cache). Data in a cache started disappearing for seemingly no reason and the amount of records could be halved (220K to 108K) overnight. I spent a couple of days trying to find a problem in the application, crunched hundreds megabytes of application logs but didn't manage to find a reason to blame the application. Retention/TTL is not set for the cache. Apache Ignite logs with the option -DIGNITE_QUIET=false also don't reveal any anomalies (or I don't know what to look for). The data shares are expected to be durable (based on Azure Disk) and we never had any issues with them. RAM utilisation is normal and there's plenty of available RAM. The Ignite cluster is hosted in a 3 node Kubernetes cluster on Azure.
The question is: how would you recommend investigating issues like this? What metrics and logs can I check? Is it possible to log and track individual Remove() operations as well as SQL queries at Ignite engine level? The application has been working on Ignite for years already and we didn't encounter data loss at such scales before. It's possible that the app wasn't used so extensively before as it is now and the problem left unnoticed. My best, Alex Avrutin