So, this issue is more complex than my original email described. I
chatted with Mark a bit yesterday about the Google patches, and what
Mark describes as "unique to the scope" is correct: in Google's version
of global transaction identifiers, the identifier for a group of events
in the binlog (which is where the term "group" comes from) is unique
within the set of nodes serving a specific set of tables/schemas.
In other words, there is a consistent hierarchical relationship enforced
in the Google group_id replication system which ensures there is always
a single master for a slave.
In Drizzle's replication system, this restriction does not exist.
Multi-source replication is perfectly acceptable and this introduces a
larger "scope" in which this identifier must be unique.
Everyone interested in this should fully read the Google FAQ linked in
the original mailing list post and pay attention to the parts where
Justin writes about possible solutions for multi-source/multi-master
replication.
This is basically where I stand right now but I'm going to use the
holidays to think and put on the wiki more ideas...:
1) Decide on a tuple format that Drizzle will use internally for the
global identifier for a Transaction message. This could be:
(server_id, group_id)
or
(server_id, timestamp, other_identifier)
or
UUID
or something completely different...
2) Focus on the interfaces
Standardize the interface where logging mechanisms and replication
plugins can ask a publisher for a global identifier representing its
last consistent state.
Standardize the interface where a plugin can map Drizzle's internal
global identifier to its own type of global identifier. For instance,
let's say Drizzle's global identifier type is defined as:
typedef uint32_t ServerId;
typedef uint64_t GroupId;
typedef std::pair<ServerId, GroupId> GlobalTransactionId;
however Tungsten's replication system uses a UUID as it's global
transaction identifier. There needs to be an interface for
translating/mapping a value of one type to the other...
Anyway, like I said, over the holidays I'll be working on putting all of
these disparate thoughts onto the Drizzle wiki. I'll post to the
mailing list when I have a good, clean wiki describing the problems and
possible interfaces and solutions.
Thanks!
Jay
Jobin Augustine wrote:
if i get it right. Replacing a "globally unique id" with a "local id" is
a good move.
my vote is for you.
++
why because: it is futuristic..
Eric Day had a blog post regarding eventually consistent databases.
even if drizzle is hard consistent inside..it may not be true if we
think about geographically distributed databases (say, many independent
Drizzles instances) talking to each other.
In a highly distributed environment, eventual consistency is something
unavoidable.
and in my humble opinion globally unique transaction id is not making
much meaning and this move is in a right direction.
by the way. the name "group id" is again confusing. automatically the
question comes "group of what?".
any better name for it?
Thank you,
Jobin.
On Wed, Dec 23, 2009 at 8:46 PM, Jay Pipes <[email protected]
<mailto:[email protected]>> wrote:
Hi all,
I'd like to get some consensus votes to solidify the terminology
around something that is soon to hit Drizzle's replication system:
A way to uniquely identify a specific Transaction in a global
replication environment.
There are two different sets of terms in use regarding the above
functionality, and I'd like to be able to settle on one set or the
other.
Indeed, if one looks at Google's implementation of the above
functionality for MySQL 5.0, the terms "group id" and "global
transaction ID" seem to be freely intermingled. Even the URL and
title of the Google FAQ on the subject/patch have contradicting terms!:
http://code.google.com/p/google-mysql-tools/wiki/GlobalTransactionIds
Note the URL says "Global Transaction IDs" and the page title says
"Global Group IDs". Very confusing to me. Anyone else?
I'd like to settle this confusion and just start referring to this
functionality by a single term: "Group ID"
The reason is that the group ID is actually *not* a global
identifier. The global identifier is actually the server ID *plus*
the group ID, and therefore referring to the group ID as the global
transaction ID is a bit of a misnomer.
I would like to change the TransactionContext message format from this:
message TransactionContext
{
required uint32 server_id = 1; /* Unique identifier of a server */
required uint64 transaction_id = 2;/*Globally-unique transaction ID */
required uint64 start_timestamp = 3;
required uint64 end_timestamp = 4;
}
to this:
message TransactionContext
{
required uint32 server_id = 1; /* Unique identifier of a server */
required uint64 group_id = 2;/* Unique ID of trx on this server */
required uint64 start_timestamp = 3;
required uint64 end_timestamp = 4;
}
Please let me know if this is OK with folks. Thanks!
Jay
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
<https://launchpad.net/%7Edrizzle-discuss>
Post to : [email protected]
<mailto:[email protected]>
Unsubscribe : https://launchpad.net/~drizzle-discuss
<https://launchpad.net/%7Edrizzle-discuss>
More help : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help : https://help.launchpad.net/ListHelp