Hi James,

An internal, unique, auto-increment PK is a good idea on dynamic, large tables. Smaller, mostly static, tables can often do without a PK. In a backend DB it's a godsend. What would happen if I depended on the programming in the client-side code to handle the PK?? It would be impossible.

Example: Customers need a PK. An internal, auto-increment, unique, integer works best on DBs. They are optimized for it, especially SQL tables.

Example: Counties. These names hardly ever change and are rather small. So, you could do without the internal, integer PK. BUT-- suppose every invoice must show the county? Then I would need a county (e.g., varchar 25) field for the county name of Yoknapatawphahootchee. If the customer has thousands of invoices with us then county starts to take up space. And a fair amount of it. Millions of invoices would take up 21 millions of bytes of storage. Just multiply number of customers by numbers of invoices. If the invoice uses a 1 byte field for the county with an FK in the county file we automatically save 40 millions of bytes of storage. Since no state has more than 255 counties, a tiny int will work.

Example: States. Hmmm. Let's see, abbreviation for Minnesota is MN. Only two bytes needed there. OK, That will work for a PK. We'll do without the integer, auto-increment, key here. And we will violate consistency in the DB. No big deal in this case. I'll go either way here.

Another good reason is YOU control the PK. Social Security numbers and ZipCodes are already being prepared for changes. Why go bonkers 7 years from now? And who wants to back-code and back-fill all that code & data?

Truncated, damaged file? What invoice does the row point to? What is the date? If it's the last row in the PK then that's where the fixing begins by date and by PK#.

Since a PK guarantees uniqueness, I can breathe easily if the file gets clobbered. Example: Someone with BIG access rights accidentally deletes invoices from 1996. But they should have done it for 1995. (We keep 10 years history on hand in an archive file.) Easy to fix. What's the last# in 1995? The first # in 1997? Restore only those rows with those numbers, inclusive. Suppose now the Customer file is by name, ZIP and first 4 digits in the address? (Or something like that. Many of my junk mail addresses show "WAGNEKO64054A". If another Wagner, Kenneth O. arrives in zip code 64054, presumably, he becomes "WAGNEKO64054B." No middle initial? The it's "WAGNEK_64054.") How could I know which customers to restore without going thru a lot of queries?

What if I have to break up a table because it's getting too large? Has too many columns or rows added to it in the last year? The integer PK works really well. And it is small and simple to eyeball. And I can still have any other unique index I want on columns in the new related 2nd table.

Mostly, I like the int, auto-increment, unique, unsigned PK because it's used everywhere, always looks the same and is very, very fast because the indexes are small, optimized for SQL usage and very parsimonious about RAM. Intuitively, the numbers also tell me about size and activity levels.

Best of all, they are inviolate. I can trust them. Well, OK, maybe once a decade a gamma ray hits the oxide layer on the disk drive and changes 3,212,434,334 to 3,712,434,334. Still easy to fix. Drop the key and re-create it. It will automatically show up. Either as a gap or a duplicate. But it WILL show up as something.

Bottom line, uniqueness, stability and order are the "sine qua non"* of good data organization. (*Means without which nothing.)

HTH,

Ken






----- Original Message ----- From: "James Harvard" <[EMAIL PROTECTED]>
To: <mysql@lists.mysql.com>
Sent: Wednesday, December 21, 2005 6:01 PM
Subject: Are primary keys essential?


The PK thread has reminded me of a question I had but never resolved when designing the table structure of the big data warehouse app I was droning on about just now in the aforementioned thread. As need to import some hundreds of millions of rows in the next week, I think now would be a good idea to get a definite answer!

The core of the app is a mass of data, broken into many tables that I normally only need to query individually. Because I felt uneasy not including a primary key and need to get a proof-of-concept db running I ended up putting an auto_increment int column in the data tables. (Yes, I know, an extra 4 bytes per row when I was talking about saving every byte possible in my last post. <blush>) But the PK column is never used either as a foreign key or in app code for the table itself. But I couldn't put a PK on a combination of other columns, because I don't think I can be sure of uniqueness. Can I just drop the PK column?

BTW I'm sure this is addressed in all those good books on database design and theory I should have, but never have, read. But I'm a bit short of time, and it's quicker just to pick the brains of you folks! Quicker for me, that is - sorry!

TIA,
James Harvard

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to