On Thu, Aug 24, 2017 at 6:38 PM, Tom Pantelis <tompante...@gmail.com> wrote:

> On Thu, Aug 24, 2017 at 12:17 PM, Michael Vorburger <vorbur...@redhat.com>
> wrote:
>
>> On Thu, Aug 24, 2017 at 9:06 AM, Michael Vorburger <vorbur...@redhat.com>
>>> wrote:
>>>
>>>> On Wed, Aug 23, 2017 at 8:54 PM, Robert Varga <n...@hq.sk> wrote:
>>>>
>>>>> On 23/08/17 20:21, Michael Vorburger wrote:
>>>>> > On Wed, Aug 23, 2017 at 2:11 PM, Robert Varga <n...@hq.sk
>>>>> > <mailto:n...@hq.sk>> wrote:
>>>>> >
>>>>> >     On 23/08/17 13:48, Michael Vorburger wrote:
>>>>> >     >
>>>>> >     > Robert> Actually no. The backend side of things looks okay, as
>>>>> chains
>>>>> >     > are being
>>>>> >     > both closed and purged when requested from the frontend. I
>>>>> suspect
>>>>> >     > somebody is forgetting to close their transaction chains...
>>>>> >     >
>>>>> >     > Michael> Is a "transaction chain" the same as a transaction -
>>>>> the
>>>>> >     > objects returned from DataBroker's newReadWriteTransaction /
>>>>> >     > newReadWriteTransaction / newWriteOnlyTransaction? I did
>>>>> stumble upont
>>>>> >     > something - could https://git.opendaylight.org/g
>>>>> errit/#/c/62196/
>>>>> >     <https://git.opendaylight.org/gerrit/#/c/62196/> fix
>>>>> >     > this problem? Or is that unlikely to fix this OOM, but still a
>>>>> Good
>>>>> >     > Idea? Or doesn't matter to do cancel() if not submit() ?
>>>>> >
>>>>> >     'the same' in the sense that both are resources which need to be
>>>>> closed.
>>>>> >     So yes, it does matter that they are properly closed.
>>>>> >
>>>>> >
>>>>> > Thanks. I spent the afternoon shooting in a couple of directions
>>>>> related
>>>>> > to this (details in https://bugs.opendaylight.org/
>>>>> show_bug.cgi?id=9034#).
>>>>> >
>>>>> > But the problem is what I've found so far is all just "stabs in the
>>>>> > dark". What we really need is a way to more reliably find the origin
>>>>> of
>>>>> > code where transactions (or TransactionChain) are created but never
>>>>> > closed...
>>>>> >
>>>>> > You mentioned on IRC that a datastore.cfg may have some option for
>>>>> that,
>>>>> > but I could not find this - does anyone know any details about that?
>>>>>
>>>>> nite@nitebug : ~$ cd distribution-karaf-0.6.2-SNAPSHOT/
>>>>> nite@nitebug : ~/distribution-karaf-0.6.2-SNAPSHOT$ ./bin/karaf
>>>>> opendaylight-user@root>feature:install odl-mdsal-broker
>>>>> nite@nitebug : ~/distribution-karaf-0.6.2-SNAPSHOT$ ls -l
>>>>> etc/org.opendaylight.controller.cluster.datastore.cfg
>>>>> -rw-rw-r--. 1 nite nite 4896 Aug 23 20:47
>>>>> etc/org.opendaylight.controller.cluster.datastore.cfg
>>>>>
>>>>> for some reason debug-transactions is not documented.
>>>>> https://git.opendaylight.org/gerrit/62224 fixes that.
>>>>>
>>>>
>>>> Tom, I've tried to dig into this (supposed) debug-transactions flag,
>>>> but am starting to have doubts if this feature actually really exists /
>>>> works (anymore? or ever did?) ... do you know anything more about the
>>>> history of this, or can take a moment to help us find out more? Cauz Robert
>>>> on IRC said: "yeah ... I don't know the exact story // the yang models was
>>>> definitely used hen we use CSS // what the exact state is these days ...
>>>> best ping tpantelis". So here's what I've found so far:
>>>>
>>>> sal-distributed-datastore/src/main/yang/distributed-datastore-provider.yang
>>>> is where this "debug-transactions" should be, because that is the model for
>>>> datastore.cfg. That YANG model does have all the other properties used in
>>>> that datastore.cfg, just not debug-transactions.
>>>>
>>>
>>> It's defined in the yang as transaction-debug-context-enabled so the
>>> .cfg file must have the same name.
>>>
>>
>> Duh! I should have seen that.. :-( sorry & thanks a lot.
>>
>> So with that I'm starting to see the related stuff in the code. What I
>> don't get yet is how this was this intended to be "surfaced" - the idea
>> probably is just to find the throwable during forensics of an hprof heap
>> dump after an OOM, if one finds a large number of ClientBackedTransaction
>> (subclass) instances? There's no CLI command (or JMX, or whatever) to "list
>> all unclosed transactions", is there?
>>
>> More importantly, as far as I can see so far by going through the code
>> (not all of which makes a whole lot of sense so far, I'll admit!), this
>> flag as is won't actually help us to find the root cause of
>> https://bugs.opendaylight.org/show_bug.cgi?id=9034 yet - which is what
>> this thread is really about. Because the ClientBackedDataStore keeps the
>> allocationContext() for newRead/Write[Only]Transaction(), but for
>> createTransactionChain() it just passes on debugAllocation() - so that the
>> Tx that the chain creates can or not have their respective
>> allocationContext preserved... but if I understand Rboert's analysis of
>> this Bug 9034 correctly, he's saying that overflowing Map I'm seeing there
>> is due to (many) TransactionChain themselves (not their Txs) not being
>> closed - am I getting this right? So to get to the bottom of that, we'll
>> need to (also) track the allocationContext at createTransactionChain()
>> itself, don't we? Shall I try to have a go at proposing a change doing this?
>>
>> Now, what I don't get and need more education from you guys on is that
>> even if we do above, that overflown Map in the ShardDataTree contains
>> ShardDataTreeTransactionChain - but above is about
>> ClientBackedTransactionChain. I'm a bit lost what's what here, but even if
>> we had the allocationContext preserved in a
>> ClientBackedTransactionChain, how do we then get that into a
>> ShardDataTreeTransactionChain to see it there?
>>
>
> The flag is used by TransactionProxy on the front-end to capture the user
> call site on allocation. It is then used by the DebugThreePhaseCommitCohort
> to print a warning:
>
>     log.warn("Transaction {} failed with error \"{}\" - was allocated in
> the following context",
>                         transactionId, failure, debugContext);
>
> when a transaction operations fails on the front-end. So, as is, it won't
> with this issue.
>

Right, so that gets me back to square one...


> BTW - the ClientBackedDataStore is for the new tell-based protocol which
> isn't yet enabled by default.
>
> Also, this gives me another (separate, new) idea: Wouldn't it be cool if
>> we had something similar to this in the testable data broker which we use
>> in tests to see where the code under test has not close'd its
>> Transaction[Chain] properly... but that require something similar to above
>> in the DataBroker used in tests, not the CDS.
>>
>
... so this (^^^) ....

Or maybe forget about JUnits - if we had something with which we could
>> externally trigger "list all unclosed transactions" (from the CDS) say at
>> the end of CSIT, that could be interesting... thoughts on this idea? Just
>> for the future - in the immediate short term I need to sort out above,
>> first.
>>
>
... and this (^^^) ...


> sal-inmemory-datastore/src/main/yang/opendaylight-inmemory-datastore-provider.yang
>>>> does have a "debug-transactions" - so its ...inmemory.datastore.provider
>>>> .rev140617.DatastoreConfiguration has an isDebugTransactions() - but I
>>>> cannot find any usage of it, even in inmemory-datastore.
>>>>
>>>> So it seems to be that something got lost here somewhere along the way
>>>> - is that possible? Or am I just too dumb, and not understanding what's
>>>> where in all this controller mdsal code?? ;-)
>>>>
>>>> BTW: What's the story for what is used and what is not used -
>>>> inmemory-datastore is history? Want me to raise a Gerrit to remove it?
>>>> Or still use for the test DataBroker (which I myself am I user of in our
>>>> test in genius and netvirt) ?
>>>>
>>>
>>> It's not used in production but still used in unit tests. It will be
>>> removed once we switch to mdsal project.
>>>
>>>
>>>> > Otherwise, an idea I just had in
>>>>> > https://bugs.opendaylight.org/show_bug.cgi?id=9034#c7 would be to
>>>>> see if
>>>>> > I could bolt something onto that mdsal-trace we have (originally
>>>>> > contributed by Josh) to keep track of opened-but-not-yet-closed
>>>>> > transactions. See the idea? Could be very useful to get to the
>>>>> bottom of
>>>>> > this kind of problem, no?
>>>>>
>>>>> No need, as mentioned above.
>>>>>
>>>>
>>>> Well, until proven otherwise, it would seem there is a need after all
>>>> then? ;-)
>>>>
>>>
... if we can merge https://git.opendaylight.org/gerrit/#/c/62294/, we'll
have a wrapper which tracks all non-closed transaction allocation, both
read[Write][Only] as well as TransactionChain...

That should be a useful tool to get to the bottom of
https://bugs.opendaylight.org/show_bug.cgi?id=9034, and future OOM issues
like it. And with just a little effort, once above has been reviewed and
merged, I can build something that will help us to even see all such
problems already during tests in the future (I'm thinking a JUnit Rule
which fails a test if the CloseTrackedRegistry are not empty at the end of
each test).


> Let me spend some more time to see if I could build this feature - I'll
>>>> try to start rebuilding it from scratch; if there's existing code for this
>>>> (which I cannot find), please point me to it so I don't redo what already
>>>> exists somewhere!
>>>>
>>>>
>>>>> >     >     > As far as I can see, with my still very limited
>>>>> understanding of mdsal
>>>>> >     >     > internals, this does not seem to be the same as our
>>>>> earlier
>>>>> >     >     > https://bugs.opendaylight.org/show_bug.cgi?id=8941
>>>>> >     <https://bugs.opendaylight.org/show_bug.cgi?id=8941>
>>>>> >     >     <https://bugs.opendaylight.org/show_bug.cgi?id=8941
>>>>> >     <https://bugs.opendaylight.org/show_bug.cgi?id=8941>> raised by
>>>>> >     >     Stephen and
>>>>> >     >     > fixed by Robert (which is being follow-up on in
>>>>> >     >     > https://bugs.opendaylight.org/show_bug.cgi?id=9028
>>>>> >     <https://bugs.opendaylight.org/show_bug.cgi?id=9028>
>>>>> >     >     <https://bugs.opendaylight.org/show_bug.cgi?id=9028
>>>>> >     <https://bugs.opendaylight.org/show_bug.cgi?id=9028>> by Robert
>>>>> and
>>>>> >     >     Tom) -
>>>>> >     >     > does this initial quick analysis seem accurate to you?
>>>>> >     >
>>>>> >     >     9034 looks like 8941, except it's not transactions, but
>>>>> chains. I'll
>>>>> >     >     cook up a prototype.
>>>>> >     >
>>>>> >     > Thanks a lot !!! We'd love to out anything you come up with.
>>>>> >
>>>>> >     As mentioned above, this is something else, so no patch coming
>>>>> from me
>>>>> >     righ tnow ;)
>>>>> >
>>>>> > OK, misunderstood because you wrote "I'll cook up a prototype" ... ?
>>>>>
>>>>> Yeah, I checked the code out and responded with "Actually no...", which
>>>>> you quoted above ;-)
>>>>>
>>>>> Bye,
>>>>> Robert
>>>>
>>>>
_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to