Re: Are primary keys essential?

Kenneth Wagner Wed, 21 Dec 2005 17:27:15 -0800

Hi James,

An internal, unique, auto-increment PK is a good idea on dynamic, largetables. Smaller, mostly static, tables can often do without a PK. In abackend DB it's a godsend. What would happen if I depended on theprogramming in the client-side code to handle the PK?? It would beimpossible.

Example: Customers need a PK. An internal, auto-increment, unique, integerworks best on DBs. They are optimized for it, especially SQL tables.

Example: Counties. These names hardly ever change and are rather small. So,you could do without the internal, integer PK. BUT-- suppose every invoicemust show the county? Then I would need a county (e.g., varchar 25) fieldfor the county name of Yoknapatawphahootchee. If the customer has thousandsof invoices with us then county starts to take up space. And a fair amountof it. Millions of invoices would take up 21 millions of bytes of storage.Just multiply number of customers by numbers of invoices. If the invoiceuses a 1 byte field for the county with an FK in the county file weautomatically save 40 millions of bytes of storage. Since no state has morethan 255 counties, a tiny int will work.

Example: States. Hmmm. Let's see, abbreviation for Minnesota is MN. Only twobytes needed there. OK, That will work for a PK. We'll do without theinteger, auto-increment, key here. And we will violate consistency in theDB. No big deal in this case. I'll go either way here.

Another good reason is YOU control the PK. Social Security numbers andZipCodes are already being prepared for changes. Why go bonkers 7 years fromnow? And who wants to back-code and back-fill all that code & data?

Truncated, damaged file? What invoice does the row point to? What is thedate? If it's the last row in the PK then that's where the fixing begins bydate and by PK#.

Since a PK guarantees uniqueness, I can breathe easily if the file getsclobbered. Example: Someone with BIG access rights accidentally deletesinvoices from 1996. But they should have done it for 1995. (We keep 10 yearshistory on hand in an archive file.) Easy to fix. What's the last# in 1995?The first # in 1997? Restore only those rows with those numbers, inclusive.Suppose now the Customer file is by name, ZIP and first 4 digits in theaddress? (Or something like that. Many of my junk mail addresses show"WAGNEKO64054A". If another Wagner, Kenneth O. arrives in zip code 64054,presumably, he becomes "WAGNEKO64054B." No middle initial? The it's"WAGNEK_64054.") How could I know which customers to restore without goingthru a lot of queries?

What if I have to break up a table because it's getting too large? Has toomany columns or rows added to it in the last year? The integer PK worksreally well. And it is small and simple to eyeball. And I can still have anyother unique index I want on columns in the new related 2nd table.

Mostly, I like the int, auto-increment, unique, unsigned PK because it'sused everywhere, always looks the same and is very, very fast because theindexes are small, optimized for SQL usage and very parsimonious about RAM.Intuitively, the numbers also tell me about size and activity levels.

Best of all, they are inviolate. I can trust them. Well, OK, maybe once adecade a gamma ray hits the oxide layer on the disk drive and changes3,212,434,334 to 3,712,434,334. Still easy to fix. Drop the key andre-create it. It will automatically show up. Either as a gap or a duplicate.But it WILL show up as something.

Bottom line, uniqueness, stability and order are the "sine qua non"* of gooddata organization. (*Means without which nothing.)


HTH,

Ken

----- Original Message -----From: "James Harvard" <[EMAIL PROTECTED]>

To: <mysql@lists.mysql.com>
Sent: Wednesday, December 21, 2005 6:01 PM
Subject: Are primary keys essential?

The PK thread has reminded me of a question I had but never resolved whendesigning the table structure of the big data warehouse app I was droningon about just now in the aforementioned thread. As need to import somehundreds of millions of rows in the next week, I think now would be a goodidea to get a definite answer!
The core of the app is a mass of data, broken into many tables that Inormally only need to query individually. Because I felt uneasy notincluding a primary key and need to get a proof-of-concept db running Iended up putting an auto_increment int column in the data tables. (Yes, Iknow, an extra 4 bytes per row when I was talking about saving every bytepossible in my last post. <blush>) But the PK column is never used eitheras a foreign key or in app code for the table itself. But I couldn't put aPK on a combination of other columns, because I don't think I can be sureof uniqueness. Can I just drop the PK column?
BTW I'm sure this is addressed in all those good books on database designand theory I should have, but never have, read. But I'm a bit short oftime, and it's quicker just to pick the brains of you folks! Quicker forme, that is - sorry!
TIA,
James Harvard

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Are primary keys essential?

Reply via email to