Do you still want me to reproduce the issue with the DEBUGs mentioned? Or 
already identified the
Root cause?

Thanks,
Faseela

-----Original Message-----
From: Robert Varga [mailto:n...@hq.sk] 
Sent: Friday, November 23, 2018 9:06 PM
To: Tom Pantelis <tompante...@gmail.com>
Cc: Faseela K <faseel...@ericsson.com>; controller-dev@lists.opendaylight.org; 
genius-...@lists.opendaylight.org
Subject: Re: [mdsal-dev] Unknown history for purged transaction 
member-2-datastore-operational-fe-0-chn-12-txn-0-0, ignoring

On 23/11/2018 16:11, Robert Varga wrote:
> On 23/11/2018 15:29, Faseela K wrote:
>> https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/rpkgenius
>> -csit-3node-gate-only-neon/15/odl_2/odl2_karaf.log.gz
>>
>> [sandbox log, will get deleted in another 24hours, I guess]
> This looks weird, as if we lost some journal records for create, or 
> got some early closures.

Tom,

I just dug around the code, and it seems CloseTransactionChain is being 
broadcast to all nodes, which dates back to 
https://git.opendaylight.org/gerrit/#/c/10833/.

On the backend side, we treat the same on all shards -- i.e. by call into 
ShardDataTree, which closes stuff it finds and then calls into 
Shard.replicatePayload(). I am pretty sure that is wrong, as I am not seeing a 
guard for that happening only on the leader and thus we are probably screwing 
up journal consistency.

So if journal index accounding and payload ID tracking is screwed up, we can 
end up firing wrong events when consensus is reached...

Can you take a look?

Thanks,
Robert

_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to