Is there a pattern to the lost records? Is it old records? Records for a particular customer? Records stored on a specific node or partition?
On Thu, 22 Feb 2024 at 21:14, Aleksej Avrutin <alexavru...@gmail.com> wrote: > Jeremy, > > Thank you for the response. I reviewed cache properties using GG Control > Center and there was nothing in the cache props that would lead me to the > conclusion that any expiry policy/TTL is set up for the cache. It wasn't > set on the operation level, either. > > I decided to delete the cache entirely and re-create it. Tomorrow I'll > check if it helps. > > My best, > Alex Avrutin > > > On Thu, Feb 22, 2024 at 3:56 AM Jeremy McMillan < > jeremy.mcmil...@gridgain.com> wrote: > >> First, logging should be configured to at least WARN level if not INFO. >> >> Ignite manages data internally at the page level. If you see errors about >> pages, it is low, low level ignite problems. The next level up is >> partitions. Errors involving partitions are mid low level ignite problems. >> The next level up is caches. Errors at the cache level are mid to high >> level problems. The next level is cache records. Errors in cache record >> handling are high level of abstraction, and the next level is client >> application operations. >> >> The lower level of abstraction the errors appear, the less chance >> operations in general will succeed. Since the cache appears to operate >> mostly as expected, and there are no obvious errors in the ignite logs, >> most likely there is some client side logic which is deleting records, and >> ignite does not consider this behavior to be in error. >> >> I would recommend fine tuning cache delete method log coverage. First >> identify if the deletion is happening on a client connection thread pool or >> a thread for server initiated operations. >> >> My guess is that a client is connecting, getting a cache object, and then >> setting expiration on that cache connection so that all cache adds under >> that cache connection will have expiration applied to them. >> >> >> https://ignite.apache.org/docs/2.14.0/configuring-caches/expiry-policies#configuration >> >> "You can also change or set Expiry Policy for individual cache >> operations. This policy is used for each operation invoked on the returned >> cache instance." >> >> >> https://ignite.apache.org/releases/latest/dotnetdoc/api/Apache.Ignite.Core.Client.Cache.ICacheClient-2.html?q=withExpiryPolicy#Apache_Ignite_Core_Client_Cache_ICacheClient_2_WithExpiryPolicy_Apache_Ignite_Core_Cache_Expiry_IExpiryPolicy_ >> >> On Wed, Feb 21, 2024, 19:17 Aleksej Avrutin <alexavru...@gmail.com> >> wrote: >> >>> Hello, >>> >>> A couple of days ago I encountered a strange phenomenon in our >>> application based on Apache Ignite .Net 2.14 with persistence (3 nodes, 1 >>> backup per cache). >>> Data in a cache started disappearing for seemingly no reason and the >>> amount of records could be halved (220K to 108K) overnight. I spent a >>> couple of days trying to find a problem in the application, crunched >>> hundreds megabytes of application logs but didn't manage to find a reason >>> to blame the application. Retention/TTL is not set for the cache. Apache >>> Ignite logs with the option -DIGNITE_QUIET=false also don't reveal any >>> anomalies (or I don't know what to look for). The data shares are expected >>> to be durable (based on Azure Disk) and we never had any issues with them. >>> RAM utilisation is normal and there's plenty of available RAM. >>> The Ignite cluster is hosted in a 3 node Kubernetes cluster on Azure. >>> >>> The question is: how would you recommend investigating issues like this? >>> What metrics and logs can I check? Is it possible to log and track >>> individual Remove() operations as well as SQL queries at Ignite engine >>> level? >>> >>> The application has been working on Ignite for years already and we >>> didn't encounter data loss at such scales before. It's possible that the >>> app wasn't used so extensively before as it is now and the problem left >>> unnoticed. >>> >>> My best, >>> Alex Avrutin >>> >>