Hi,
Alex Yurchenko wrote:
On Fri, 24 Jul 2009 10:56:20 +0200, Paul McCullagh
<[email protected]> wrote:
On Jul 23, 2009, at 3:15 PM, Stewart Smith wrote:
On Tue, Jul 21, 2009 at 09:28:54PM -0700, MARK CALLAGHAN wrote:
How is the serial log to be kept in sync with a storage engine given
the Applier interface? MySQL uses two phase commit, but the Applier
interface has one method, ::apply(). In addition to methods for
performing 2PC, keeping a storage engine and the serial log in sync
requires additional methods for crash recovery to support commit or
rollback of transactions in state PREPARED in the storage engine
depending on the outcome recorded in the serial log.
The bit that keeps banging in my head in regards to this is storing it
in the same engine as part of the transaction and so avoiding 2pc.
We discussed this on Drizzle Day, and that was my recommendation.
This would mean, after a transaction has committed, the replication
system asks the engine for a "list of operations" that were performed
by the transaction.
I welcome the idea of the meaningful conversation between the replication system
and the engine. Mark's crash recovery challenge could probably be solved by
asking the engine
to store a little piece of data in the redo log, then in the course of normal
engine crash recovery
the engine will report it back to replication so the replication will know what
exactly to replay.
One thing I'm confident that can not be solved without it is addressing the
problem where the application
'optimizes' redo log flushing as what is done with
innodb_flush_log_at_trx_commit.
otoh I hope that 'ask for a list of operations after the commit' is just an
algorithm description,
not the actual implementation. I think the communication could be made into
something more simultaneous
especially for the regime where the engine is normally asked to do row by row
operations.
For engines that have this information in their transaction log, it is
a relatively simple task.
This is absolutely the way to go. From the replication perspective picking
log events straight form the storage engine transactional log would save
quite a lot of IO and CPU. In addition those log events can be in the
engine native form, so a blast to apply on a slave.
not necessarily a blast since depending on the engine workings the same redo
wouldn't necessarily easily apply on another node. redo generation logic is
usually optimized
for the immediate task at hand which is crash recovery on the same system
without slowing
down normal operation too much so adjusting it for replication tends to lag
behind, Oracle
logical replication is an example of how complex it can become. oracle's
physical replication
is an example when it becomes a blast as you say, but then the slave tends to
have limited functionality
which takes effort to overcome.
Thanks,
Michael
Then, in 99.9% of cases when there are not cross engine
transactions, we
never need 2pc.
Although I haven't given this intense deep thought as to various
corner cases...
--
Stewart Smith
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help : https://help.launchpad.net/ListHelp
--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help : https://help.launchpad.net/ListHelp