Assert should always be on when we're QAing the system, including BVT.  I've 
alerted many people who work on QA about that.  Unfortunately, that's a 
deployment time setting so it's up the deployer who's running the BVT to set 
that.  Perhaps, we can write a script to perform QA deployment and it has all 
of these things set already.  That way QA deployment is always the same.

--Alex

> -----Original Message-----
> From: Marcus [mailto:shadow...@gmail.com]
> Sent: Monday, March 31, 2014 3:15 PM
> To: dev@cloudstack.apache.org
> Subject: Re: [DESIGN] No agent calls within database transactions....
> 
> Yeah, that assert issue has bitten us once or twice, and I know Ryan
> squawked about it at some point.  Do we have any point where enforcement
> will occur (BVT or some other tests)?
> 
> On Mon, Mar 31, 2014 at 4:04 PM, Alex Huang <alex.hu...@citrix.com>
> wrote:
> > Hi All,
> >
> > I was alerted to this problem recently and it's something that affects
> developers so I want to bring it up.  It is a design principle in CloudStack 
> that
> we do not make agent calls within database transactions.  The reason is
> because when you make a call to an external system, there's no guarantee
> on how long the call takes or even whether the call returns.  When a call
> takes a long time, several bad things can happen:
> >         - The MySQL DB Connection held opened due to the DB transaction
> goes into idle. Eventually, a timeout in MySQL hits and the connection gets
> severed and the transaction is rolled back.  By default, this timeout is 45
> seconds but can be changed via a parameter in my.cnf.  So it's problem that
> the agent call completes just fine but the DB transaction rolls back and
> changes are undone.
> >         - The rows locked in that transaction before the remote agent call 
> > could
> be holding up other foreign key checks into the table.  MySQL runs foreign
> key checks in transactions to make sure the data modification and the checks
> are done atomically.  Therefore, these checks must wait for other
> transactions to complete.  Hence, an agent call that takes sometime can
> severely slow down the system, particularly under scale.
> >
> > We have two solutions to this:
> >         - Drive agent interactions with states.  There are many examples of 
> > this
> in VM, Volume, etc.
> >         - When the above cannot be done, acquire a lock in the lock table 
> > via a
> DAO method call.  Locks do not maintain DB transactions and therefore will
> not run into this problem.  However, you are responsible for releasing locks.
> It used to be that if you forget to release the locks, the @DB annotation
> automatically releases locks once it went out of the scope and asserts to 
> alert
> the developer.  However, the @DB annotation has been removed in the
> Spring work so I'm not sure if it's still done.
> >
> > This is a tough problem to solve because
> >         1. It usually works just fine during functional testing.  During 
> > scale
> testing, this problem surfaces and often in unexpected places due to the
> foreign key check problem.
> >         2. For developers, it is difficult for them to know if a method that
> they're calling within a transaction ends up in an agent call.
> >
> > There is an assert in AgentManager to ensure that there are no db
> transactions before making a agent call.  Apparently, since the conversion to
> Maven, no one actually runs with assert on any more.  Due to that, this
> design principle has been lost in CloudStack and we're finding more and more
> calls being made in DB transactions.   To counter that, I decided to add a
> global parameter that turns the assert to an actual exception.  It is advised
> that all developers set this global parameter,
> check.txn.before.sending.agent.commands, during their own testing to
> make sure it doesn't call agent calls in transactions.
> >
> > --Alex
> >
> >

Reply via email to