Assert should always be on when we're QAing the system, including BVT. I've alerted many people who work on QA about that. Unfortunately, that's a deployment time setting so it's up the deployer who's running the BVT to set that. Perhaps, we can write a script to perform QA deployment and it has all of these things set already. That way QA deployment is always the same.
--Alex > -----Original Message----- > From: Marcus [mailto:shadow...@gmail.com] > Sent: Monday, March 31, 2014 3:15 PM > To: dev@cloudstack.apache.org > Subject: Re: [DESIGN] No agent calls within database transactions.... > > Yeah, that assert issue has bitten us once or twice, and I know Ryan > squawked about it at some point. Do we have any point where enforcement > will occur (BVT or some other tests)? > > On Mon, Mar 31, 2014 at 4:04 PM, Alex Huang <alex.hu...@citrix.com> > wrote: > > Hi All, > > > > I was alerted to this problem recently and it's something that affects > developers so I want to bring it up. It is a design principle in CloudStack > that > we do not make agent calls within database transactions. The reason is > because when you make a call to an external system, there's no guarantee > on how long the call takes or even whether the call returns. When a call > takes a long time, several bad things can happen: > > - The MySQL DB Connection held opened due to the DB transaction > goes into idle. Eventually, a timeout in MySQL hits and the connection gets > severed and the transaction is rolled back. By default, this timeout is 45 > seconds but can be changed via a parameter in my.cnf. So it's problem that > the agent call completes just fine but the DB transaction rolls back and > changes are undone. > > - The rows locked in that transaction before the remote agent call > > could > be holding up other foreign key checks into the table. MySQL runs foreign > key checks in transactions to make sure the data modification and the checks > are done atomically. Therefore, these checks must wait for other > transactions to complete. Hence, an agent call that takes sometime can > severely slow down the system, particularly under scale. > > > > We have two solutions to this: > > - Drive agent interactions with states. There are many examples of > > this > in VM, Volume, etc. > > - When the above cannot be done, acquire a lock in the lock table > > via a > DAO method call. Locks do not maintain DB transactions and therefore will > not run into this problem. However, you are responsible for releasing locks. > It used to be that if you forget to release the locks, the @DB annotation > automatically releases locks once it went out of the scope and asserts to > alert > the developer. However, the @DB annotation has been removed in the > Spring work so I'm not sure if it's still done. > > > > This is a tough problem to solve because > > 1. It usually works just fine during functional testing. During > > scale > testing, this problem surfaces and often in unexpected places due to the > foreign key check problem. > > 2. For developers, it is difficult for them to know if a method that > they're calling within a transaction ends up in an agent call. > > > > There is an assert in AgentManager to ensure that there are no db > transactions before making a agent call. Apparently, since the conversion to > Maven, no one actually runs with assert on any more. Due to that, this > design principle has been lost in CloudStack and we're finding more and more > calls being made in DB transactions. To counter that, I decided to add a > global parameter that turns the assert to an actual exception. It is advised > that all developers set this global parameter, > check.txn.before.sending.agent.commands, during their own testing to > make sure it doesn't call agent calls in transactions. > > > > --Alex > > > >