On Thu, Aug 24, 2017 at 12:17 PM, Michael Vorburger <vorbur...@redhat.com> wrote:
> On Thu, Aug 24, 2017 at 9:06 AM, Michael Vorburger <vorbur...@redhat.com> >> wrote: >> >>> On Wed, Aug 23, 2017 at 8:54 PM, Robert Varga <n...@hq.sk> wrote: >>> >>>> On 23/08/17 20:21, Michael Vorburger wrote: >>>> > On Wed, Aug 23, 2017 at 2:11 PM, Robert Varga <n...@hq.sk >>>> > <mailto:n...@hq.sk>> wrote: >>>> > >>>> > On 23/08/17 13:48, Michael Vorburger wrote: >>>> > > >>>> > > Robert> Actually no. The backend side of things looks okay, as >>>> chains >>>> > > are being >>>> > > both closed and purged when requested from the frontend. I >>>> suspect >>>> > > somebody is forgetting to close their transaction chains... >>>> > > >>>> > > Michael> Is a "transaction chain" the same as a transaction - >>>> the >>>> > > objects returned from DataBroker's newReadWriteTransaction / >>>> > > newReadWriteTransaction / newWriteOnlyTransaction? I did >>>> stumble upont >>>> > > something - could https://git.opendaylight.org/g >>>> errit/#/c/62196/ >>>> > <https://git.opendaylight.org/gerrit/#/c/62196/> fix >>>> > > this problem? Or is that unlikely to fix this OOM, but still a >>>> Good >>>> > > Idea? Or doesn't matter to do cancel() if not submit() ? >>>> > >>>> > 'the same' in the sense that both are resources which need to be >>>> closed. >>>> > So yes, it does matter that they are properly closed. >>>> > >>>> > >>>> > Thanks. I spent the afternoon shooting in a couple of directions >>>> related >>>> > to this (details in https://bugs.opendaylight.org/ >>>> show_bug.cgi?id=9034#). >>>> > >>>> > But the problem is what I've found so far is all just "stabs in the >>>> > dark". What we really need is a way to more reliably find the origin >>>> of >>>> > code where transactions (or TransactionChain) are created but never >>>> > closed... >>>> > >>>> > You mentioned on IRC that a datastore.cfg may have some option for >>>> that, >>>> > but I could not find this - does anyone know any details about that? >>>> >>>> nite@nitebug : ~$ cd distribution-karaf-0.6.2-SNAPSHOT/ >>>> nite@nitebug : ~/distribution-karaf-0.6.2-SNAPSHOT$ ./bin/karaf >>>> opendaylight-user@root>feature:install odl-mdsal-broker >>>> nite@nitebug : ~/distribution-karaf-0.6.2-SNAPSHOT$ ls -l >>>> etc/org.opendaylight.controller.cluster.datastore.cfg >>>> -rw-rw-r--. 1 nite nite 4896 Aug 23 20:47 >>>> etc/org.opendaylight.controller.cluster.datastore.cfg >>>> >>>> for some reason debug-transactions is not documented. >>>> https://git.opendaylight.org/gerrit/62224 fixes that. >>>> >>> >>> Tom, I've tried to dig into this (supposed) debug-transactions flag, >>> but am starting to have doubts if this feature actually really exists / >>> works (anymore? or ever did?) ... do you know anything more about the >>> history of this, or can take a moment to help us find out more? Cauz Robert >>> on IRC said: "yeah ... I don't know the exact story // the yang models was >>> definitely used hen we use CSS // what the exact state is these days ... >>> best ping tpantelis". So here's what I've found so far: >>> >>> sal-distributed-datastore/src/main/yang/distributed-datastore-provider.yang >>> is where this "debug-transactions" should be, because that is the model for >>> datastore.cfg. That YANG model does have all the other properties used in >>> that datastore.cfg, just not debug-transactions. >>> >> >> It's defined in the yang as transaction-debug-context-enabled so the >> .cfg file must have the same name. >> > > Duh! I should have seen that.. :-( sorry & thanks a lot. > > So with that I'm starting to see the related stuff in the code. What I > don't get yet is how this was this intended to be "surfaced" - the idea > probably is just to find the throwable during forensics of an hprof heap > dump after an OOM, if one finds a large number of ClientBackedTransaction > (subclass) instances? There's no CLI command (or JMX, or whatever) to "list > all unclosed transactions", is there? > > More importantly, as far as I can see so far by going through the code > (not all of which makes a whole lot of sense so far, I'll admit!), this > flag as is won't actually help us to find the root cause of > https://bugs.opendaylight.org/show_bug.cgi?id=9034 yet - which is what > this thread is really about. Because the ClientBackedDataStore keeps the > allocationContext() for newRead/Write[Only]Transaction(), but for > createTransactionChain() it just passes on debugAllocation() - so that the > Tx that the chain creates can or not have their respective > allocationContext preserved... but if I understand Rboert's analysis of > this Bug 9034 correctly, he's saying that overflowing Map I'm seeing there > is due to (many) TransactionChain themselves (not their Txs) not being > closed - am I getting this right? So to get to the bottom of that, we'll > need to (also) track the allocationContext at createTransactionChain() > itself, don't we? Shall I try to have a go at proposing a change doing this? > > Now, what I don't get and need more education from you guys on is that > even if we do above, that overflown Map in the ShardDataTree contains > ShardDataTreeTransactionChain - but above is about > ClientBackedTransactionChain. I'm a bit lost what's what here, but even if > we had the allocationContext preserved in a ClientBackedTransactionChain, > how do we then get that into a ShardDataTreeTransactionChain to see it > there? > The flag is used by TransactionProxy on the front-end to capture the user call site on allocation. It is then used by the DebugThreePhaseCommitCohort to print a warning: log.warn("Transaction {} failed with error \"{}\" - was allocated in the following context", transactionId, failure, debugContext); when a transaction operations fails on the front-end. So, as is, it won't with this issue. BTW - the ClientBackedDataStore is for the new tell-based protocol which isn't yet enabled by default. > Also, this gives me another (separate, new) idea: Wouldn't it be cool if > we had something similar to this in the testable data broker which we use > in tests to see where the code under test has not close'd its > Transaction[Chain] properly... but that require something similar to above > in the DataBroker used in tests, not the CDS. > > Or maybe forget about JUnits - if we had something with which we could > externally trigger "list all unclosed transactions" (from the CDS) say at > the end of CSIT, that could be interesting... thoughts on this idea? Just > for the future - in the immediate short term I need to sort out above, > first. > > >> sal-inmemory-datastore/src/main/yang/opendaylight-inmemory-datastore-provider.yang >>> does have a "debug-transactions" - so its ...inmemory.datastore.provider >>> .rev140617.DatastoreConfiguration has an isDebugTransactions() - but I >>> cannot find any usage of it, even in inmemory-datastore. >>> >>> So it seems to be that something got lost here somewhere along the way - >>> is that possible? Or am I just too dumb, and not understanding what's where >>> in all this controller mdsal code?? ;-) >>> >>> BTW: What's the story for what is used and what is not used - >>> inmemory-datastore is history? Want me to raise a Gerrit to remove it? >>> Or still use for the test DataBroker (which I myself am I user of in our >>> test in genius and netvirt) ? >>> >> >> It's not used in production but still used in unit tests. It will be >> removed once we switch to mdsal project. >> >> >>> >>> >>>> > Otherwise, an idea I just had in >>>> > https://bugs.opendaylight.org/show_bug.cgi?id=9034#c7 would be to >>>> see if >>>> > I could bolt something onto that mdsal-trace we have (originally >>>> > contributed by Josh) to keep track of opened-but-not-yet-closed >>>> > transactions. See the idea? Could be very useful to get to the bottom >>>> of >>>> > this kind of problem, no? >>>> >>>> No need, as mentioned above. >>>> >>> >>> Well, until proven otherwise, it would seem there is a need after all >>> then? ;-) >>> >>> Let me spend some more time to see if I could build this feature - I'll >>> try to start rebuilding it from scratch; if there's existing code for this >>> (which I cannot find), please point me to it so I don't redo what already >>> exists somewhere! >>> >>> >>>> > > > As far as I can see, with my still very limited >>>> understanding of mdsal >>>> > > > internals, this does not seem to be the same as our >>>> earlier >>>> > > > https://bugs.opendaylight.org/show_bug.cgi?id=8941 >>>> > <https://bugs.opendaylight.org/show_bug.cgi?id=8941> >>>> > > <https://bugs.opendaylight.org/show_bug.cgi?id=8941 >>>> > <https://bugs.opendaylight.org/show_bug.cgi?id=8941>> raised by >>>> > > Stephen and >>>> > > > fixed by Robert (which is being follow-up on in >>>> > > > https://bugs.opendaylight.org/show_bug.cgi?id=9028 >>>> > <https://bugs.opendaylight.org/show_bug.cgi?id=9028> >>>> > > <https://bugs.opendaylight.org/show_bug.cgi?id=9028 >>>> > <https://bugs.opendaylight.org/show_bug.cgi?id=9028>> by Robert >>>> and >>>> > > Tom) - >>>> > > > does this initial quick analysis seem accurate to you? >>>> > > >>>> > > 9034 looks like 8941, except it's not transactions, but >>>> chains. I'll >>>> > > cook up a prototype. >>>> > > >>>> > > Thanks a lot !!! We'd love to out anything you come up with. >>>> > >>>> > As mentioned above, this is something else, so no patch coming >>>> from me >>>> > righ tnow ;) >>>> > >>>> > OK, misunderstood because you wrote "I'll cook up a prototype" ... ? >>>> >>>> Yeah, I checked the code out and responded with "Actually no...", which >>>> you quoted above ;-) >>>> >>>> Bye, >>>> Robert >>>> >>>> >>> >> >
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev