Re: Data loss in an Ignite application

Stephen Darlington Fri, 23 Feb 2024 01:38:49 -0800

Is there a pattern to the lost records? Is it old records? Records for a
particular customer? Records stored on a specific node or partition?


On Thu, 22 Feb 2024 at 21:14, Aleksej Avrutin <alexavru...@gmail.com> wrote:

> Jeremy,
>
> Thank you for the response. I reviewed cache properties using GG Control
> Center and there was nothing in the cache props that would lead me to the
> conclusion that any expiry policy/TTL is set up for the cache. It wasn't
> set on the operation level, either.
>
> I decided to delete the cache entirely and re-create it. Tomorrow I'll
> check if it helps.
>
> My best,
> Alex Avrutin
>
>
> On Thu, Feb 22, 2024 at 3:56 AM Jeremy McMillan <
> jeremy.mcmil...@gridgain.com> wrote:
>
>> First, logging should be configured to at least WARN level if not INFO.
>>
>> Ignite manages data internally at the page level. If you see errors about
>> pages, it is low, low level ignite problems. The next level up is
>> partitions. Errors involving partitions are mid low level ignite problems.
>> The next level up is caches. Errors at the cache level are mid to high
>> level problems. The next level is cache records. Errors in cache record
>> handling are high level of abstraction, and the next level is client
>> application operations.
>>
>> The lower level of abstraction the errors appear, the less chance
>> operations in general will succeed. Since the cache appears to operate
>> mostly as expected, and there are no obvious errors in the ignite logs,
>> most likely there is some client side logic which is deleting records, and
>> ignite does not consider this behavior to be in error.
>>
>> I would recommend fine tuning cache delete method log coverage. First
>> identify if the deletion is happening on a client connection thread pool or
>> a thread for server initiated operations.
>>
>> My guess is that a client is connecting, getting a cache object, and then
>> setting expiration on that cache connection so that all cache adds under
>> that cache connection will have expiration applied to them.
>>
>>
>> https://ignite.apache.org/docs/2.14.0/configuring-caches/expiry-policies#configuration
>>
>> "You can also change or set Expiry Policy for individual cache
>> operations. This policy is used for each operation invoked on the returned
>> cache instance."
>>
>>
>> https://ignite.apache.org/releases/latest/dotnetdoc/api/Apache.Ignite.Core.Client.Cache.ICacheClient-2.html?q=withExpiryPolicy#Apache_Ignite_Core_Client_Cache_ICacheClient_2_WithExpiryPolicy_Apache_Ignite_Core_Cache_Expiry_IExpiryPolicy_
>>
>> On Wed, Feb 21, 2024, 19:17 Aleksej Avrutin <alexavru...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> A couple of days ago I encountered a strange phenomenon in our
>>> application based on Apache Ignite .Net 2.14 with persistence (3 nodes, 1
>>> backup per cache).
>>> Data in a cache started disappearing for seemingly no reason and the
>>> amount of records could be halved (220K to 108K) overnight. I spent a
>>> couple of days trying to find a problem in the application, crunched
>>> hundreds megabytes of application logs but didn't manage to find a reason
>>> to blame the application. Retention/TTL is not set for the cache. Apache
>>> Ignite logs with the option -DIGNITE_QUIET=false also don't reveal any
>>> anomalies (or I don't know what to look for). The data shares are expected
>>> to be durable (based on Azure Disk) and we never had any issues with them.
>>> RAM utilisation is normal and there's plenty of available RAM.
>>> The Ignite cluster is hosted in a 3 node Kubernetes cluster on Azure.
>>>
>>> The question is: how would you recommend investigating issues like this?
>>> What metrics and logs can I check? Is it possible to log and track
>>> individual Remove() operations as well as SQL queries at Ignite engine
>>> level?
>>>
>>> The application has been working on Ignite for years already and we
>>> didn't encounter data loss at such scales before. It's possible that the
>>> app wasn't used so extensively before as it is now and the problem left
>>> unnoticed.
>>>
>>> My best,
>>> Alex Avrutin
>>>
>>

Re: Data loss in an Ignite application

Reply via email to