James Tu wrote:
Hi:
Let's say I want to store the following information.
Unique ID - INT(10) autoincrement
First Name - VARCHAR (25)
Last Name - VARCHAR (25)
Age - INT(3)
In general 'age' is a bad column, because you need to know what year the
data was entered to calculate the current age. It is often better to
store year of birth or date of birth. This may not be relevant to your
application, I just wanted to mention it.
Date - DATETIME
Activity - VARCHAR(100)
Data - TEXT
I would be basing my queries on all columns _except_ the Data column. I.e. I
would be using WHERE's with all except the Data column.
You are not telling us how much data you are planning to maintain. How
big will the "Data" column be, on average, and how many rows/persons are
we talking about? Hundreds, thousands or millions?
My question is...which design would perform better?
(Design A) Put all in one table...index all the columns that I will use
WHERE with.
-TABLE_ALL-
Unique ID - INT(10) autoincrement
First Name - VARCHAR (25)
Last Name - VARCHAR (25)
Age - INT(3)
Date - DATETIME
Activity - VARCHAR(100)
Data - TEXT
Indices - Unique ID, First Name, Last Name, Age, Date, Activity
You will probably not need to index all columns. If you have few rows,
you don't need indexes at all, except for the primary key on the unique
ID. A primary key automatically works as an index.
I would probably start with only the primary key, and add indexes only
when I find that some queries are too slow.
SELECT First_Name, Last_Name, Data
FROM TABLE_ALL
WHERE
Activity = 'draw' AND Age > 24;
(Design B) Put the Data in its own separate table.
-TABLE_A-
Unique ID - INT(10) autoincrement
First Name - VARCHAR (25)
Last Name - VARCHAR (25)
Age - INT(3)
Date - DATETIME
Activity - VARCHAR(100)
Data_ID - INT(10)
Indices - Unique ID, First Name, Last Name, Age, Date, Activity
-TABLE_B-
Data_ID - INT(10)
Data - TEXT
Index - Data_ID
This will be faster if your Data column is relatively big (several K on
average, I don't know. depends on your HW, of course).
I would suggest using the unique ID from TABLE_A as a primary key in
TABLE_B, and drop Data_ID from TABLE_A.
If there are millions of rows I would normalize these tables to the
extreme, something like this:
Person: P_Id,Born
FName: FN_Id,FirstName
LName: LN_Id,LastName
FN_P: FN_Id,P_id
LN_P: LN_Id,P_id
Activity: A_Id,Activity
Act_P: A_id,P_Id
Data:P_Id,Data
FN_P and LN_P are so-called "link tables", linking names to persons in a
many-to-many relation. Even further normalization would have been
achieved with an additional counter column. It would be used in these
tables to maintain the order of the names when a person have multiple
first names or last names, so that you would have one FName row for each
unique name, "Mary Jane" would be split in "Mary" and "Jane".
You could query this schema like this:
SELECT FirstName,LastName,Data
FROM Person,FName,LName,Data,Activity,FN_P,LN_P,Act_P
WHERE
Person.P_Id = Data.P_Id AND
Person.P_Id = FN_P.P_Id AND
Person.P_Id = LN_P.P_Id AND
Person.P_Id = Act_P.P_Id AND
FName.FN_Id = FN_P.FN_Id AND
LName.LN_Id = LN_P.LN_Id AND
Activity.A_Id = Act_P.A_Id AND
Activity = 'draw' and Born < year(now()) - 24
...or with more explicit formulated joins, like this:
SELECT FirstName,LastName,Data
FROM Person
NATURAL JOIN Act_P NATURAL JOIN Activity
INNER JOIN FN_P ON FN_P.P_Id=Person.P_Id NATURAL JOIN FName
INNER JOIN LN_P ON LN_P.P_Id=Person.P_Id NATURAL JOIN LName,
LEFT JOIN Data ON Data.P_Id = Person.P_Id
WHERE
Activity = 'draw' and Born < year(now()) - 24
The NATURAL JOINS are joins based on columns with the same name in the
two joined tables, see the manual. The LEFT JOIN is used in this case
because some Persons may not have a corresponding row in the Data table,
in this case the Data column of the result table will contain NULL. If
you used an INNER join in place of the LEFT join in this case, Persons
without a Data record would be omitted from the result.
SELECT TABLE_A.First_Name, TABLE_A.Last_Name, TABLE_B.Data
FROM TABLE_A, TABLE_B
WHERE
Activity = 'draw' AND Age > 24 AND TABLE_A.Data_ID = TABLE_B.Data_ID;
(Aside: Would this query give me the same results as the above query?)
Yes, I think so, if all rows in TABLE_A have a corresponding row in TABLE_B.
--
Roger
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]