A good decision on wether to use multi-column indexes will take into
account the cardinality of the data. Sometimes a multi-column index
is simply the right answer, particularly if the values in the first
column of the index do not limit the rows sufficiently. Also in
where exists clauses the rdbms will not even load the row if the
primary key is the composite index, but it will have to. Indexes are
best thought of like caches, You use the right caching mechanism for
the job, and you only do it when you can demonstrate performance
improvement.
The mysql reference is true but not a matter of concern in a large
number of cases when composite keys are being used. Analyising
complex queries is a good idea, but hunting out and eliminating
multi-column indexes doesn't seem helpful.
On Aug 26, 2009, at 6:40 AM, Jean-Michel Pouré wrote:
Le mercredi 26 août 2009 à 08:30 -0400, Jamie Holly a écrit :
When writing queries they should be targeted for performance on all
supported RDBMS, yet the emphasis should be put on MySQL. The simple
fact that a vast (vast) majority of Drupal sites out there run MySQL
is
reason enough for that, and until there is some major cosmic shift in
what hosting companies offer that's going to remain the fact.
Dear Jamie,
I have been rewriting a generic guide for both MySQL and PostrgeSQL:
Guidelines for writing efficient SQL code http://drupal.org/node/
559302
These are very generic guidelines explaining why you should always log
server-side queries, use Devel module and ANALYSE your queries.
I also explain the notion of sequential scan and CPU time needed.
Then I analyse simple facts using EXPLAIN.
Also on indexes you say "Therefore the proposed solution to enhance
Drupal is to hunt for multi-column indexes and replace them with
single
indexes.". MySQL makes very effecient use of multi column indexes. On
countless queries in Drupal changing to only single column indexes
would
result in full table scans, temporary tables and filesorts in
numerous
places.
About dual-field indexing: PostgreSQL and MySQL work the same way.
Some databases claim that a dual index works equally right and left.
This is not portable and probably not true, as multi-column
indexing may
result in a sequential scan in complex queries:
Reference:
According to
http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html:
MySQL cannot use an index if the columns do not form a
leftmost
prefix of the index.
http://www.postgresql.org/docs/8.4/static/indexes-multicolumn.html
index could in principle be used for queries that have
constraints on b and/or c with no constraint on a — but the
entire index would have to be scanned, so in most cases the
planner would prefer a sequential table scan over using the
index.
Therefore, when needed, it could be needed to add single indexes AND
keep dual indexes.
Do LEFT JOINS outperform INNER JOINS? Yes
Do INNER JOINS outperform LEFT JOINS? Yes
I don't believe so because of the nature of INNER JOINs which remove
NULL rows and thus consume more CPU time. They only work faster on
indexes on dual-fields when used properly.
This needs to be studied using ANALYSE on very large SQL servers, not
simple installs. Drupal website database probably is a good testing
system.
I hope that you can comment my code and improve the English and
technical aspects.
Kind regards,
Jean-Michel