After a day of looking, I answered my own questions, and I'll post those answers here in case anyone else was interested in the answer.


First, "LOAD DATA" + "ALTER TABLE ADD INDEX"... seems to be slower than a "mysqldump" + "ANALYZE TABLE". Of course, you don't always have a mysql dump file.

After importing a mysql dump file, it's wise to analyze all tables imported. I found that SHOW INDEX FROM table; would show the cardinality (the number of unique values) of an index. The more unique the data being indexed, the faster the index is. Another way to put it, if you set up an equation like,

(# of rows in table) divided by (cardinality of an index)

you would want a number that is as close to 1 as possible (there will never be more unique values in a table than there are rows). The lower that ratio is (the closer to 1), the more efficient the index becomes.


Here's an example of the cardinality after a dump but before an ANALYZE TABLE, and after an ANALYZE TABLE (these two indexes are on the same table) from our database; there are 502055 rows in this table. Index names have been changed to protect the innocent:


Before the ANALYZE,

index1 has a cardinality of 81214
index2 has a cardinality of 81214

After the ANALYZE
index1 has a cardinality of 97192
index2 has a cardinality of 20248

If no analyze was done, and someone did an equi-join on the column indexed by index1 and a second join on the column indexed by index2, the optimizer would use some other criteria for selecting an index other than the cardinality (perhaps the data-type of the column, the alphabetical order of the column, etc - anyone know what that criteria would be?).

After the analyze, you can see that the cardinality of index1 has gone up, while index2 has gone down. Index1 is now a far better choice for the optimizer to use (remember, MySQL can only use one index per table per query, so it has to pick the most efficient one), and should result in faster results.

Hope that provides some insight for anyone interested.

David.



David Griffiths wrote:


We have a somewhat large database, with a snapshot of the data that we import into the database. Normally, we create a database, create the tables, import the data with "LOAD DATA", add the indexes, and then the foreign keys (we are using InnoDB, 4.0.18). We call this the load-data-method.


Sometimes we dump the data with mysqldump, create a new database, and pipe the dump file into the new database. We call this the mysqldump-method.

I was wondering about the time it takes to import either one. The mysqldump-method takes about 45 minutes. Tables, indexes and foreign keys are created. The load-data-method takes two to three hours.

Using LOAD DATA is supposed to be the fastest way to get data into MySQL. We use the ALTER TABLE ADD INDEX (), ADD INDEX() ... method, which is supposed to be the fastest way to add indexes to a table.

So I'm curious as to why it takes so much longer. I suspect that the mysqldump-method needs an "ANALYZE TABLE ..." (for each table in the database) at the end of it, to rebuild the statistics for the tables, and if that was done, then the load-data-method would be comparitively faster. Is this correct, or am I off-base with this? Are the statistics for the indexes in InnoDB correct or incorrect after a mysqldump file is imported into the db.

This brings my next question - as I was looking around, trying to get some insight, I was reading the mysql documentation on "SHOW INDEX FROM tablename", which is supposed to show key-distribution across a table (http://dev.mysql.com/doc/mysql/en/ANALYZE_TABLE.html).

I've run it, and I'm a bit confused about inferring anything from the results of the "SHOW INDEX..." statement (http://dev.mysql.com/doc/mysql/en/SHOW_INDEX.html). The most interesting column in the result set seems to be the "cardinality" column, which stores the number of unique values in an index, but on large tables, it's tough to know if that's accurate if the index is not unique. Is there any way to read the results of a SHOW INDEX to be able to figure out if a table needs analyzing?

MySQL (Paul DuBois) was written before ANALYZE TABLE was relevant for InnoDB, and High Performance MySQL (Jeremy Zawodny and Derek Balling) doesn't delve into the details about the circumstances where tables (and indexes) need to be re-analyzed.

Thanks in advance for any replies.

David.








--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to