Rod Heyd <[EMAIL PROTECTED]> wrote on 12/09/2005 11:01:38 AM:

> Hi Everyone,
> 
> I've got a little debate that maybe you can all help me settle.
> 
> I have three tables that I need to join, but there are performance 
problems
> with the joins due to some misunderstandings of my predecessor about 
what's
> needed to join the tables efficiently.
> 
> Here's the setup.
> 
> t1 has a unique key defined on it, call it command_id, t1 has a 1 to 
many
> relationship with t2.
> t2 has t1's command_id as a foreign key constraint, plus a set of one or
> more instrument_id's.
> Each command results in one or more instruments taking data.
> 
> The commanding elements defined here are then set to our instrument to
> aquire the data.
> When the data comes back it is organized into t3 by command_id and
> instrument_id.
> So the primary key on t3 looks like this: command_id_instrument_id.
> 
> So, now I need to write a query that looks at what was commanded in t1 
and
> t2 and then look for
> any missing data that has not yet been received.
> 
> So, I've got a query that looks something like this:
> 
> SELECT  stuff
> 
> FROM
>             t1
> JOIN
>             t2
> ON
>            t1.command_id = t2.command_id
> 
> LEFT JOIN
>           t3
> ON
>         t3.data_id = concat(t1.command_id,'_',t2.instrument_id)
> 
> Now, I think everyone is going to see immediately that the join on t3 is
> going to have absolutely horrible performance,
> the join condition here can't take advantage of any indexes since the 
string
> function pretty much destroys any hope of that. To make matters worse, 
the
> left join is a total killer.
> 
> 
> So my suggestion to solve the performance bottleneck is to add two 
columns
> to t3,
> command_id and instrument_id, and create a combined index on the 
command_id
> and instrument_id
> columns.
> 
> the join condition on t3 then becomes:
> 
> LEFT JOIN
>          t3
> ON
>         t3.command_id = t1.command_id
> AND
>         t3.instrument_id = t2.instrument_id
> 
> This works beautifully!  The performance of this new join condition is 
about
> 480 times faster than the original.
> 
> Here's the rub.  Since there is a unique data_id that already exists 
which
> combines the information in both
> command_id and instrument_id keys, I'm being told by our seasoned 
software
> developer that I am violating "classic" database design rules against
> redundant data.
> 
> In my opinion, this is a minor "violation" at best.  We have a good 
reason
> for wanting to identify each data segment with our originally defined
> data_id, but this is not strictly a requirement on the database, it's 
more
> of an agreed upon convention that we are going to use to identify the 
data
> segments and distribute them to our consumers.  From a database stand 
point,
> the only requirement is that the data_id be unique.  It could be 
anything as
> far as the database is concerned, as long as the data_id remains unique, 
it
> doesn't matter that it may be overloaded with some "duplicate" 
information.
> Any more experienced DBA's than I have an opinion on this?
> 
> Thanks!

You are not duplicating data by referring to objects by their id values. 
The false-normalization through the use of the composite key on t3 was a 
mistake. By obscuring the actual relationships between t1, t2, and t3, 
your original designer broke one of the cardinal rules of designing a 
relational database. Each foreign key should point to at least  (and 
usually at most) one row on a single table. His foreign key pointed to any 
row on either table. That was a bad design decision that created serious 
performance bottlenecks.  If you have the chance to fix his design of t3 
to keep the separate columns for each FK, please do it. If you want to 
keep the composite key for historical reasons, I don't see why that can't 
happen but you should stop using it as the primary key for the table. 

The drawback to composite keys is that they could possibly refer to more 
than one thing.  Assume for a moment that you are creating a composite key 
out of two text-type fields. Also imagine that you are compositing those 
fields with a _ character (as in your example). This pair ('testme_a', 
'secondhalf') and this pair ('testme', 'a_secondhalf') would both evaluate 
to 'testme_a_secondhalf'. We lose our 1:1 correspondence between key and 
parent.

Obviously, my example was contrived and using just numbers can avoid this 
kind of behavior but it illustrated why it is just not correct to 
construct foreign keys. Foreign keys are SUPPOSED to contain values that 
actually match something in the other table. Using them does not violate 
the "no duplicates" rule your friend describes and can be used to enforce 
relational integrity.

Just so you have it on record (IMHO): You are correct and the other DBA 
should read up about modern relational databases and modern theories of 
normalization.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Reply via email to