On Wed, 29 Oct 2003 20:11:49 +0100 Brad Knowles <[EMAIL PROTECTED]> wrote: > At 1:30 PM -0500 2003/10/29, J C Lawrence wrote:
>> 2) Message IDs are not guaranteed globally unique, but the collision >> rate can be manageable/acceptable in a large number of deployment >> cases. > Outside of a database, this may be something you can decide whether or > not to live with. Within the confines of a database, this simply is > not possible. Of course, and that's the point. We are in violent agreement. > The ANSI SQL specification has some hard requirements for a primary > index key: I know, but that's not what I'm asserting. I'll also ignore the DB types which don't require primary keys of any form, as that's essentially what we have now and we're assuming an indexed store instead. >> We don't have to guarantee key uniqueness for all messages BEFORE >> they are submitted to the message store. > All other keys could potentially be non-unique, or null, but not > the primary index key. Ahh, I think I see the disjoint. We're using "key" in two contexts without distinguishing between them: 1) The property of a message which identifies that message with a high probability of uniqueness. This can be a Message ID, MD5SUM, whatever, but it is not guaranteed unique, it merely is unique most of the time for large definitions of "most". 2) The primary key as used in an indexed DB or other store which is guaranteed unique for all cases. Between the two there's a conflict. One requires perfect uniqueness. The other delivers merely a good Best Effort. The assertion is that we don't always have to solve that mismatch. We can elect to live with the collisions. > This is why many applications have the database assign the primary > index key itself on insertion into the table, so that all the > necessary requirements can be met. Sure, except that doing that in our case requires that storage be a synchronous operation (otherwise we don't know the key at rewrite/delivery time). That would a significant change from the current model and rather unfriendly to a wide range of deployment cases. Keeping the storage procedure asynchronous with an a-priori key (for whatever guarantee of uniqueness) makes for a more interesting system. >> I'm neither an idiot or a neophyte in this game. Yes, a database >> needs a primary unique key. > Then you must realize that we could not possibly use message-id as the > primary index key, unless this is a field that we generate ourselves > in such a way that all the necessary requirements are met. No, I don't realise that because it is false. We can use Message IDs as the primary key right now, today. In fact, I am, right now, this minute, today. You are assuming that every message submitted to the store must be accepted by the store. That is an assumption that hasn't been defined as a requirement and which some evidence suggests isn't a hard requirement. A very small percentage of the messages I submit to my store don't make it. They have duplicate Message IDs. They run through Mailman just fine. They never reach my list archives. I know, expect, accept this. The primary key has to be unique for every message IN THE STORE. Accepted. That does not dictate that the primary key for every message SUBMITTED to the store has to be unique (not that key assignment is occurring before collision check), or that the store has to ACCEPT every message which is submitted to it. Guaranteeing perfect uniqueness of the keys prior to submission to the store is fragile and expensive. It is tempting to do some form of very good approximation (cg Chuq's MD5SUM). Without perfect synchrony with the store's keys. if we calculate keys prior to insertion, or merely accept the keys that are given us in the form of Message IDs we're going to get occasional collisions. The question is how to handle messages whose a priori assigned keys collide with keys already in the store. We can handle the collision case in several ways: 1) Ignore it and discard messages bearing colliding keys. 2) Best Effort attempt to guarantee uniqueness within a window, with collisions outside the window discarded. 3) Fully guarantee uniqueness. The first is easy. The second is fairly easy. The third isn't trivial. In all three cases the population of key values in the store remains unique. Its just that the population of keys submitted to the store may or may not be unique. Lossage at the insertion layer can be acceptable. >> Let's at least be on the same page. > Agreed. Cool. -- J C Lawrence ---------(*) Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. _______________________________________________ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers