Do you still want me to reproduce the issue with the DEBUGs mentioned? Or already identified the Root cause?
Thanks, Faseela -----Original Message----- From: Robert Varga [mailto:n...@hq.sk] Sent: Friday, November 23, 2018 9:06 PM To: Tom Pantelis <tompante...@gmail.com> Cc: Faseela K <faseel...@ericsson.com>; controller-dev@lists.opendaylight.org; genius-...@lists.opendaylight.org Subject: Re: [mdsal-dev] Unknown history for purged transaction member-2-datastore-operational-fe-0-chn-12-txn-0-0, ignoring On 23/11/2018 16:11, Robert Varga wrote: > On 23/11/2018 15:29, Faseela K wrote: >> https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/rpkgenius >> -csit-3node-gate-only-neon/15/odl_2/odl2_karaf.log.gz >> >> [sandbox log, will get deleted in another 24hours, I guess] > This looks weird, as if we lost some journal records for create, or > got some early closures. Tom, I just dug around the code, and it seems CloseTransactionChain is being broadcast to all nodes, which dates back to https://git.opendaylight.org/gerrit/#/c/10833/. On the backend side, we treat the same on all shards -- i.e. by call into ShardDataTree, which closes stuff it finds and then calls into Shard.replicatePayload(). I am pretty sure that is wrong, as I am not seeing a guard for that happening only on the leader and thus we are probably screwing up journal consistency. So if journal index accounding and payload ID tracking is screwed up, we can end up firing wrong events when consensus is reached... Can you take a look? Thanks, Robert _______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev