"Peter Normann" <[EMAIL PROTECTED]> wrote on 06/16/2005 02:15:34 PM:
> Sorry guys,
> I just learned that ctrl-return would send the email you are composing,
so I
> got cut off short, so let me try again :-/
> It appears to be usual practice from what I can see in this list to use
a
> primary key that actually contain somewhat meaningful data.
> I have always used primary keys solely for one purpose only: To identify
a
> table row uniquely, using auto incremental integers. Actually, if I were
to
> use a unique number in an application, I would add a column to the table
for
> just that. I would never use the primary key. The table may contain
foreign
> keys and those may hold an entirely different value than this primary -
even
> though the other table would share a one to one relationship.
> I have always felt, that if you would rely on a primary key for holding
> somewhat meaningful data, somewhere down the road it could spell
trouble.
> Anyway, I just wanted to hear if anybody would share their thoughts on
any
> advantages or drawbacks as to having the primary key contain meaningful
> data.
> Peter Normann
Primary Keys (PKs) have special significance in the realm of database
theory and operation. A PK value or tuple (for multi-column PKs) will
uniquely identify each and every row of data within a table. Because of
this unique distinction, many database engines (including MySQL's InnoDB
engine) will store their data in PK order (called clustering). Many (if
not most) engines use the PK as one half of the "pointers" table used to
store data in the actual files of the database. Because pointers are
usually stored as offsets (a fixed size) the size of the PK is strongly
influenced on the choice of the column(s) that construct the PK. If no PK
is declared for a table, some databases will use EVERY column (all of
them) and hash them together to make the pointers table. That means you
can have duplicate rows in your database and if you wanted to update just
one of those duplicates, you couldn't.
The reason that many of us use "real" data in our primary keys is because
the PK is also an index. It's two uses for the price of one: data
integrity, faster lookups.
However, it's more common to assign auto-inc numbers to rows so that it
becomes possible to refer to those rows with a numeric value rather than
to duplicate the actual "unique data".
Numbers take up either 2,4, or 8 bytes and are compared MUCH faster than
string values. Those facts form part of the basis of the theory of
database normalization. When you are trying to normalize a database, you
want to replace commonly used values with a reference of those values.
That way the values are stored only once (less space = faster searches).
However, it would still be "correct" (in a textbook-kind-of-way) to use
the full "unique data" tuple for referencing a row in one table from
another ,a foreign key(FK). Under most circumstances, tables are arranged
in a parent-child relationship where the parent can have several children
but each child can have only one parent. That means that the FK stored on
the child table must uniquely identify a single row of the parent table.
That leaves us with only two options: use the auto-inc value or use the
"unique data" tuple. It takes much less space (usually) to store just the
auto-inc value.
Using a numeric PK to store, sort, and lookup records will be faster than
using the data itself so most people declare their auto-inc columns as the
PK of their table and assign a UNIQUE index to their actual data. That way
their data values are both indexed and protected against duplication.
Would changing the "meaningful" data (the unique tuple) break a FK (if it
was the value used)? Maybe. In MySQL, you can declare FKs for cascading
updates. That means that if I had was using the tuple {STEVE,989984} for a
FK and STEVE wants to be called STEVEN then updating the PK to {STEVEN,
989984) would automatically update the FKs on the tables that used it. If
you didn't have cascading updates, and you wanted to change the value, you
would need to:
a) start a transaction
b) change all child tables to point to the new FK
c) change the PK on the parent table
d) commit the transaction
By wrapping the process within a transactional boundary, we prevent the
database from becoming "inconsistent" (changed in one place but not in
another). So while it's possible to use "real" data as FK tuples, it's
takes more administration if you need to change the PK value it
references. That is another argument in favor of the auto-inc value being
the PK. Because it has no significance to the data (except to identify a
row) changing significant fields will not break relational integrity.
I know I have rambled but you asked a very "wide" question and it's late
in my work day. Thanks for bearing with me.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine