Re: [fw-general] Question about SQL (not ZF related, sorry)

Bill Karwin Wed, 30 Jun 2010 13:46:52 -0700


On Jun 30, 2010, at 12:28 PM, Aurimas Likas wrote:

I have 3 DB tables: products, tags and products_tags. products_tagshas 2 columns product_id and tag_id (for many-to-many relationship).So the question is: how can I select products which have tag1 ANDtag2 and tag3 and .... tagN? I need a best way in terms ofperformance. The only way I can think of now is to create a querywith self joins. The query joins number would equal tags number - 1.

Using N-1 self-joins is the best solution for performance, especiallyif you use MySQL. It appears to require a lot of joins, and commonwisdom is that you should avoid joins.


SELECT p.* FROM products p
JOIN products_tags pt1 ON p.product_id = pt1.product_id
JOIN tags t1 ON pt1.tag_id = t1.tag_id AND t1.tag = 'medical'
JOIN products_tags pt2 ON p.product_id = pt2.product_id
JOIN tags t2 ON pt1.tag_id = t2.tag_id AND t2.tag = 'performance'
JOIN products_tags pt3 ON p.product_id = pt3.product_id
JOIN tags t3 ON pt3.tag_id = t3.tag_id AND t3.tag = 'reporting';

If you use natural keys for the tags table instead of a pseudokey,then your products_tags table contains the value you need, and you aneliminate half of those joins. For example:


CREATE TABLE tags (
  tag VARCHAR(20) PRIMARY KEY   -- natural key, not pseudokey
);

CREATE TABLE products_tags (
  product_id INT NOT NULL,
  tag VARCHAR(20) NOT NULL,  -- natural key, not pseudokey
  PRIMARY KEY (product_id, tag),
  FOREIGN KEY (product_id) REFERENCES products(product_id),
  FOREIGN KEY (tag) REFERENCES tags(tag)
);

SELECT p.* FROM products p

JOIN products_tags pt1 ON p.product_id = pt1.product_id AND pt1.tag ='medical'JOIN products_tags pt2 ON p.product_id = pt2.product_id AND pt2.tag ='performance'JOIN products_tags pt3 ON p.product_id = pt3.product_id AND pt3.tag ='reporting';

The primary key index in products_tags gives this good performance.Your EXPLAIN report should say "Using index" which means it cansatisfy the query simply by reading the index, without reading thedata for the products_tags table at all. If your index is cached indatabase memory, this runs very fast.

A different solution that some people prefer is to use GROUP BY andinclude groups that have as many rows as the number of tags:


SELECT p.*
FROM products p
JOIN products_tags pt ON p.product_id = pt.product_id
WHERE pt.tag IN ('medical', 'performance', 'reporting')

GROUP BY p.product_id -- EXPLAIN reports "Using temporary; Usingfilesort"

HAVING COUNT(*) = 3;

Common wisdom says that one join is always better than three or six,but this actually performs much *worse* (if you use MySQL), becausevirtually any query using GROUP BY uses a temporary table, which killsperformance because disk I/O is expensive, and temporary tablesusually write to disk.

This might not be such a bad solution if you use some other brand ofdatabase that optimizes GROUP BY better than MySQL. So you shouldanalyze queries with EXPLAIN (or equivalent) and profile carefullywith sample data of realistic size.


Regards,
Bill Karwin

Re: [fw-general] Question about SQL (not ZF related, sorry)

Reply via email to