On Thu, Aug 24, 2017 at 6:38 PM, Tom Pantelis <tompante...@gmail.com> wrote:
> On Thu, Aug 24, 2017 at 12:17 PM, Michael Vorburger <vorbur...@redhat.com> > wrote: > >> On Thu, Aug 24, 2017 at 9:06 AM, Michael Vorburger <vorbur...@redhat.com> >>> wrote: >>> >>>> On Wed, Aug 23, 2017 at 8:54 PM, Robert Varga <n...@hq.sk> wrote: >>>> >>>>> On 23/08/17 20:21, Michael Vorburger wrote: >>>>> > On Wed, Aug 23, 2017 at 2:11 PM, Robert Varga <n...@hq.sk >>>>> > <mailto:n...@hq.sk>> wrote: >>>>> > >>>>> > On 23/08/17 13:48, Michael Vorburger wrote: >>>>> > > >>>>> > > Robert> Actually no. The backend side of things looks okay, as >>>>> chains >>>>> > > are being >>>>> > > both closed and purged when requested from the frontend. I >>>>> suspect >>>>> > > somebody is forgetting to close their transaction chains... >>>>> > > >>>>> > > Michael> Is a "transaction chain" the same as a transaction - >>>>> the >>>>> > > objects returned from DataBroker's newReadWriteTransaction / >>>>> > > newReadWriteTransaction / newWriteOnlyTransaction? I did >>>>> stumble upont >>>>> > > something - could https://git.opendaylight.org/g >>>>> errit/#/c/62196/ >>>>> > <https://git.opendaylight.org/gerrit/#/c/62196/> fix >>>>> > > this problem? Or is that unlikely to fix this OOM, but still a >>>>> Good >>>>> > > Idea? Or doesn't matter to do cancel() if not submit() ? >>>>> > >>>>> > 'the same' in the sense that both are resources which need to be >>>>> closed. >>>>> > So yes, it does matter that they are properly closed. >>>>> > >>>>> > >>>>> > Thanks. I spent the afternoon shooting in a couple of directions >>>>> related >>>>> > to this (details in https://bugs.opendaylight.org/ >>>>> show_bug.cgi?id=9034#). >>>>> > >>>>> > But the problem is what I've found so far is all just "stabs in the >>>>> > dark". What we really need is a way to more reliably find the origin >>>>> of >>>>> > code where transactions (or TransactionChain) are created but never >>>>> > closed... >>>>> > >>>>> > You mentioned on IRC that a datastore.cfg may have some option for >>>>> that, >>>>> > but I could not find this - does anyone know any details about that? >>>>> >>>>> nite@nitebug : ~$ cd distribution-karaf-0.6.2-SNAPSHOT/ >>>>> nite@nitebug : ~/distribution-karaf-0.6.2-SNAPSHOT$ ./bin/karaf >>>>> opendaylight-user@root>feature:install odl-mdsal-broker >>>>> nite@nitebug : ~/distribution-karaf-0.6.2-SNAPSHOT$ ls -l >>>>> etc/org.opendaylight.controller.cluster.datastore.cfg >>>>> -rw-rw-r--. 1 nite nite 4896 Aug 23 20:47 >>>>> etc/org.opendaylight.controller.cluster.datastore.cfg >>>>> >>>>> for some reason debug-transactions is not documented. >>>>> https://git.opendaylight.org/gerrit/62224 fixes that. >>>>> >>>> >>>> Tom, I've tried to dig into this (supposed) debug-transactions flag, >>>> but am starting to have doubts if this feature actually really exists / >>>> works (anymore? or ever did?) ... do you know anything more about the >>>> history of this, or can take a moment to help us find out more? Cauz Robert >>>> on IRC said: "yeah ... I don't know the exact story // the yang models was >>>> definitely used hen we use CSS // what the exact state is these days ... >>>> best ping tpantelis". So here's what I've found so far: >>>> >>>> sal-distributed-datastore/src/main/yang/distributed-datastore-provider.yang >>>> is where this "debug-transactions" should be, because that is the model for >>>> datastore.cfg. That YANG model does have all the other properties used in >>>> that datastore.cfg, just not debug-transactions. >>>> >>> >>> It's defined in the yang as transaction-debug-context-enabled so the >>> .cfg file must have the same name. >>> >> >> Duh! I should have seen that.. :-( sorry & thanks a lot. >> >> So with that I'm starting to see the related stuff in the code. What I >> don't get yet is how this was this intended to be "surfaced" - the idea >> probably is just to find the throwable during forensics of an hprof heap >> dump after an OOM, if one finds a large number of ClientBackedTransaction >> (subclass) instances? There's no CLI command (or JMX, or whatever) to "list >> all unclosed transactions", is there? >> >> More importantly, as far as I can see so far by going through the code >> (not all of which makes a whole lot of sense so far, I'll admit!), this >> flag as is won't actually help us to find the root cause of >> https://bugs.opendaylight.org/show_bug.cgi?id=9034 yet - which is what >> this thread is really about. Because the ClientBackedDataStore keeps the >> allocationContext() for newRead/Write[Only]Transaction(), but for >> createTransactionChain() it just passes on debugAllocation() - so that the >> Tx that the chain creates can or not have their respective >> allocationContext preserved... but if I understand Rboert's analysis of >> this Bug 9034 correctly, he's saying that overflowing Map I'm seeing there >> is due to (many) TransactionChain themselves (not their Txs) not being >> closed - am I getting this right? So to get to the bottom of that, we'll >> need to (also) track the allocationContext at createTransactionChain() >> itself, don't we? Shall I try to have a go at proposing a change doing this? >> >> Now, what I don't get and need more education from you guys on is that >> even if we do above, that overflown Map in the ShardDataTree contains >> ShardDataTreeTransactionChain - but above is about >> ClientBackedTransactionChain. I'm a bit lost what's what here, but even if >> we had the allocationContext preserved in a >> ClientBackedTransactionChain, how do we then get that into a >> ShardDataTreeTransactionChain to see it there? >> > > The flag is used by TransactionProxy on the front-end to capture the user > call site on allocation. It is then used by the DebugThreePhaseCommitCohort > to print a warning: > > log.warn("Transaction {} failed with error \"{}\" - was allocated in > the following context", > transactionId, failure, debugContext); > > when a transaction operations fails on the front-end. So, as is, it won't > with this issue. > Right, so that gets me back to square one... > BTW - the ClientBackedDataStore is for the new tell-based protocol which > isn't yet enabled by default. > > Also, this gives me another (separate, new) idea: Wouldn't it be cool if >> we had something similar to this in the testable data broker which we use >> in tests to see where the code under test has not close'd its >> Transaction[Chain] properly... but that require something similar to above >> in the DataBroker used in tests, not the CDS. >> > ... so this (^^^) .... Or maybe forget about JUnits - if we had something with which we could >> externally trigger "list all unclosed transactions" (from the CDS) say at >> the end of CSIT, that could be interesting... thoughts on this idea? Just >> for the future - in the immediate short term I need to sort out above, >> first. >> > ... and this (^^^) ... > sal-inmemory-datastore/src/main/yang/opendaylight-inmemory-datastore-provider.yang >>>> does have a "debug-transactions" - so its ...inmemory.datastore.provider >>>> .rev140617.DatastoreConfiguration has an isDebugTransactions() - but I >>>> cannot find any usage of it, even in inmemory-datastore. >>>> >>>> So it seems to be that something got lost here somewhere along the way >>>> - is that possible? Or am I just too dumb, and not understanding what's >>>> where in all this controller mdsal code?? ;-) >>>> >>>> BTW: What's the story for what is used and what is not used - >>>> inmemory-datastore is history? Want me to raise a Gerrit to remove it? >>>> Or still use for the test DataBroker (which I myself am I user of in our >>>> test in genius and netvirt) ? >>>> >>> >>> It's not used in production but still used in unit tests. It will be >>> removed once we switch to mdsal project. >>> >>> >>>> > Otherwise, an idea I just had in >>>>> > https://bugs.opendaylight.org/show_bug.cgi?id=9034#c7 would be to >>>>> see if >>>>> > I could bolt something onto that mdsal-trace we have (originally >>>>> > contributed by Josh) to keep track of opened-but-not-yet-closed >>>>> > transactions. See the idea? Could be very useful to get to the >>>>> bottom of >>>>> > this kind of problem, no? >>>>> >>>>> No need, as mentioned above. >>>>> >>>> >>>> Well, until proven otherwise, it would seem there is a need after all >>>> then? ;-) >>>> >>> ... if we can merge https://git.opendaylight.org/gerrit/#/c/62294/, we'll have a wrapper which tracks all non-closed transaction allocation, both read[Write][Only] as well as TransactionChain... That should be a useful tool to get to the bottom of https://bugs.opendaylight.org/show_bug.cgi?id=9034, and future OOM issues like it. And with just a little effort, once above has been reviewed and merged, I can build something that will help us to even see all such problems already during tests in the future (I'm thinking a JUnit Rule which fails a test if the CloseTrackedRegistry are not empty at the end of each test). > Let me spend some more time to see if I could build this feature - I'll >>>> try to start rebuilding it from scratch; if there's existing code for this >>>> (which I cannot find), please point me to it so I don't redo what already >>>> exists somewhere! >>>> >>>> >>>>> > > > As far as I can see, with my still very limited >>>>> understanding of mdsal >>>>> > > > internals, this does not seem to be the same as our >>>>> earlier >>>>> > > > https://bugs.opendaylight.org/show_bug.cgi?id=8941 >>>>> > <https://bugs.opendaylight.org/show_bug.cgi?id=8941> >>>>> > > <https://bugs.opendaylight.org/show_bug.cgi?id=8941 >>>>> > <https://bugs.opendaylight.org/show_bug.cgi?id=8941>> raised by >>>>> > > Stephen and >>>>> > > > fixed by Robert (which is being follow-up on in >>>>> > > > https://bugs.opendaylight.org/show_bug.cgi?id=9028 >>>>> > <https://bugs.opendaylight.org/show_bug.cgi?id=9028> >>>>> > > <https://bugs.opendaylight.org/show_bug.cgi?id=9028 >>>>> > <https://bugs.opendaylight.org/show_bug.cgi?id=9028>> by Robert >>>>> and >>>>> > > Tom) - >>>>> > > > does this initial quick analysis seem accurate to you? >>>>> > > >>>>> > > 9034 looks like 8941, except it's not transactions, but >>>>> chains. I'll >>>>> > > cook up a prototype. >>>>> > > >>>>> > > Thanks a lot !!! We'd love to out anything you come up with. >>>>> > >>>>> > As mentioned above, this is something else, so no patch coming >>>>> from me >>>>> > righ tnow ;) >>>>> > >>>>> > OK, misunderstood because you wrote "I'll cook up a prototype" ... ? >>>>> >>>>> Yeah, I checked the code out and responded with "Actually no...", which >>>>> you quoted above ;-) >>>>> >>>>> Bye, >>>>> Robert >>>> >>>>
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev