Hi Sergei, yes, there are many reasons why cohort may fail during commit phase. Spider has a lot reasons too. In this particular case (test case provided by Elena) it fails with the following error: ERROR 42S02: Table 'mysql.spider_xa' doesn't exist
Anyway it is not clear how to handle cohort commit failure properly. Let's say we have 4 cohorts participating in XA transaction. Cohort 2 and 3 fail. Cohort 1 can't rollback (because it committed). What should we do with cohort 4 (commit/rollback/nothing)? Should we remove this transaction from xid_cache? Should we indicate clearly which cohorts failed? Should it be error or a warning? Should we hold the whole system (all cohorts + manager) until failure is resolved? Thanks, Sergey On Fri, Oct 04, 2013 at 06:02:51PM +0200, Sergei Golubchik wrote: > Hi, Sergey! > > On Oct 04, Sergey Vojtovich wrote: > > Hi Kentoku, > > > > I just reviewed one of your revisions, specifically > > bzr diff -c3829 lp:~kentokushiba/maria/10.0.4-spider-3.0/ > > > > I believe things are a bit more complex: 2PC protocol doesn't seem to permit > > cohorts to fail during commit phase: > > http://en.wikipedia.org/wiki/Two-phase_commit_protocol#Commit_phase > > > > <quot> > > If the coordinator received an agreement message from all cohorts during the > > commit-request phase: > > 1. The coordinator sends a commit message to all the cohorts. > > 2. Each cohort completes the operation, and releases all the locks and > > resources held during the transaction. > > 3. Each cohort sends an acknowledgment to the coordinator. > > 4. The coordinator completes the transaction when all acknowledgments have > > been received. > > </quot> > > > > I read the above as: the only problem coordinator may experience is missing > > acknowledgement. What shall coordinator do if some cohorts acknowledged > > commit, but some did not? Probably spider should detect it earlier? > > > > Sergei, what's your opinion? > > Let me see, if I understood the problem correctly. > The crash happens because spider uses my_error() in the 2pc commit step, > and the error status is lost up the stack, so Diagnostic_area::ok() > fires an asserts on redefining the statement status. Is that right? > > The server should know that the error has happened on commit and should > not trigger an assert, it should report the error to the user. > The error at the commit step should normally never happen, it means > inconsistent data, because some participants might've already committed > the transaction and they cannot roll it back anymore. Still, the commit > method *might* return an error status and we shouldn't ignore it. > Hardware failures are a good example of what can cause a commit error. > > Anyway, Spider should be fixed to not error out in 2pc commits, because > such a commit means inconsistent data, it's a bad error, it breaks ACID. > An engine is expected to check all preconditions during prepare, and if > prepare succeeds, it is basically a guarantee that the commit will > succeed, it is not allowed to fail anymore. > > Regards, > Sergei > _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

