Re: multi-column primary keys (like in lots of join tables/habtm)

Albert Siersema Thu, 10 Aug 2006 01:42:28 -0700

(first off: this isn't meant as criticism or a way for me to point out
there are shortcomings in the cakephp (RoR/ORM/... ) way, it's more of
a 'live, discuss & learn' conversation for me)


> Well, the first one was quite interesting in the way it condemned the
> use of primary keys altogether.  I think it's way off base, but still
> interesting.
I think you misread it a bit.
(and I should have put the links in reverse order.)
The links to the three articles of postgresql's Josh are very interesting:
 http://blogs.ittoolbox.com/database/soup/archives/primary-keyvil-part-i-7327
 http://blogs.ittoolbox.com/database/soup/archives/primary-keyvil-part-ii-7345
 http://blogs.ittoolbox.com/database/soup/archives/primary-keyvil-part-iii-7365
I think Josh writes in a very informative and moreover neutral way.
He's not spouting any fan-boyism or whatever. And he's very knowledgable in the 
sql area.

In SQL theory & practice there's no such thing as a 'primary' key.
There's just keys.
And there's nothing 'bad' about composite keys. There are even very good 
reasons why
you should use them instead of surrogate keys like an autonumber/serial id 
field.

His article(s) is about just that: the abuse of 'primary' keys and auto 
increment/serial
'id' keys specifically.
I don't think anyone can honestly say that what's being analyzed in those 
articles is bollocks.

And to put you at ease :)
He sums up reasons why you could be sticking to Autonumber Surrogate Keys.
One of them is:
 "External Requirements
 Usually Good. The RAD, DB Abstraction and Programmer Demands arguments all 
amount to external
 requirements to use integer keys. Certainly a degree of genericization is 
necessary for
 any multi-purpose tool or code set. This is the most common reason for me when 
I succumb
 to using autonumber IDs. However, this should be a compelling reason only 
after you've
 evaluated the RAD tool, DB abstraction library and/or the staff involved to 
make sure that
 integer keys are a real requirement and that the tool/person will actually 
push your project
 forwards instead of becoming an obstacle."

> The bottom line for me is that (a) it introduces needless complexity
> into the system
In what way ? And what system ? The rdbms ? The mvc framework ?
Introducing a superfluous surrogate ("id") key to your table(s) not
related to the data sounds to me like needless complexity.
Putting the complexity into the application (+ extra chances for bugs)
doesn't sound like a very good idea to me either. Whereas having the rdbms
enforcing consistancy checks (cascades, etc) in the very database design makes 
sense.
Using an 'id' column doesn't necessarily mean speed improvements.
Quite the opposite can happen. See here for example:
http://microjet.ath.cx/WebWiki/2005.12.11_PreferringNaturalToSurrogateKey.html
If you have a table where e.g. name, address, location (and or ssn) needs to be
UNIQUE it basically means you have a (natural) key. Putting a UNIQUE constraint
on them as well probably means extra strain on the rdbms. But let's not go 
there too much :)

> and (b) not only does the practice of using join table
> foriegn keys as composite primary keys smack of referential integrity
> badness, (c) it also confuses the purpose of the keys and introduces
> dangerous dependencies into the system.  If the join record links to
> another record, you lose your reference to the original join record.
> This is not how you normalize a database.
I'm not quite sure if I follow you here, especially the part;
"If the join record links to another record,
 you lose your reference to the original join record"

Lets go back to my example of a join table:
 CREATE TABLE authors_posts (
   author_id INTEGER REFERENCES authors(id)
             ON DELETE CASCADE,
   post_id   INTEGER REFERENCES posts(id)
             ON DELETE CASCADE,
   type      INTEGER DEFAULT 0,
   PRIMARY KEY(author_id, post_id, type)
 );
I could have put a 'ON UPDATE CASCADE' in there, but by referencing 'id'
columns of other tables I prefer that the id column in question is never
changed. Mind you: immutable keys are not a necessity in rdbms design,
but using serial id columns one can assume it is and 'on update' can be
omitted.
Choosing a key is a process to define what columns define the uniqueness
of the record in question. In this case there can be only one (1) record
in the join table with the same author_id, post_id and type. There can
be more types for the same author_id+post_id combination and so on.

If we DELETE an author all records in the join table (automagically)
will perish as well. Same goes for the posts.
Sounds good to me.

If I change any of the *_id columns in the join record it just 'points'
to another set of author/post records. I don't see where that would
'lose my reference to the original record'. The original record is
not needed anymore: i changed it to point to other author/post records.

The goal of join tables is just to define a "n <> m" relation between
two tables.
I can query all post records that 'belong to' author_id
(preferably specifying type in the query as well, as this is the index)

Albert

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/cake-php
-~----------~----~----~----~------~----~------~--~---

Re: multi-column primary keys (like in lots of join tables/habtm)

Reply via email to