Thinking out of the box... (And posting my reply at the 'wrong' end of the email.)...
Are there fewer than 64 genres? Use a SET or BIGINT UNSIGNED. > > > AND sg.`genre_id` IN (10,38) > > > AND sg.`genre_id` NOT IN (22,61) --> AND genre & ((1<<10) | (1<<38)) != 0 AND genre & ((1<<22) | (1<<61)) = 0 This would avoid having that extra table, and save a lot of space. If you have more than 64 genres, then Plan A: clump them into some kind of application grouping and use multiple INTs/SETs. Plan B: do mod & div arithmetic to compute which genre field to tackle. For B, something like: AND (genre1 & (1<<0)) + (genre3 & (1<<8)) != 0 AND (genre2 & (1<<2)) + (genre6 & (1<<1)) = 0 (That's assuming 10 bits per genre# field. I would use 32 and INT UNSIGNED.) > -----Original Message----- > From: Daevid Vincent [mailto:dae...@daevid.com] > Sent: Tuesday, June 11, 2013 4:17 PM > To: mysql@lists.mysql.com > Cc: 'shawn green' > Subject: RE: How do I select all rows of table that have some rows in > another table (AND, not OR) > > > > > -----Original Message----- > > From: shawn green [mailto:shawn.l.gr...@oracle.com] > > Sent: Tuesday, June 11, 2013 2:16 PM > > To: mysql@lists.mysql.com > > Subject: Re: How do I select all rows of table that have some rows in > > another table (AND, not OR) > > > > Hello Daevid, > > > > On 6/11/2013 3:59 PM, Daevid Vincent wrote: > > > I am trying to implement a filter so that a user could select > > > various > > genres > > > they want "in" or "out". Perhaps they like 'action' and 'car chases' > > > but don't like 'foreign' and 'drifting' (or whatever...) > > > > > > So I want something sort of like this, however IN() is using an "OR" > > > comparison when I need it to be an "AND" > > > > > > SELECT DISTINCT > > > s.`scene_id` AS `id`, > > > GROUP_CONCAT(sg.`genre_id`) FROM > > > `dvds` AS d > > > JOIN `scenes_list` AS s > > > ON s.`dvd_id` = d.`dvd_id` > > > JOIN `scenes_genres` AS sg > > > ON sg.`scene_id` = s.`scene_id` > > > AND sg.`genre_id` IN (10,38) > > > AND sg.`genre_id` NOT IN (22,61) GROUP BY > > > s.`scene_id`; > > > > > > This is giving me way way too many rows returned. > > > > > > For example, I would expect this scene_id to be in the result set: > > > > > > SELECT * FROM scenes_genres WHERE scene_id = 17; > > > > > > scene_id genre_id > > > -------- ---------- > > > 17 1 > > > 17 3 > > > 17 10 <-- > > > 17 19 > > > 17 38 <-- > > > 17 53 > > > 17 58 > > > 17 59 > > > > > > And this scene ID to NOT be in the result set: > > > > > > SELECT * FROM scenes_genres WHERE scene_id = 11; > > > > > > scene_id genre_id > > > -------- ---------- > > > 11 1 > > > 11 10 <-- > > > 11 19 > > > 11 31 > > > 11 32 > > > <-- but does not have 38 > > > 11 59 > > > > > > I've tried various subselect ideas, but of course this fails b/c > genre_id > > > can't be multiple things at one time (AND) > > > > > > JOIN `scenes_genres` AS sg > > > ON sg.`scene_id` = s.`scene_id` > > > AND sg.`genre_id` IN ( > > > SELECT `genre_id` FROM `scenes_genres` > > > WHERE `genre_id` = 10 > > > AND `genre_id` = 38 > > > AND `genre_id` <> 22 > > > AND `genre_id` <> 61 > > > ) > > > > > > And straight up like this failure too... > > > > > > JOIN `scenes_genres` AS sg > > > ON sg.`scene_id` = s.`scene_id` > > > AND (sg.`genre_id` = 10 AND sg.`genre_id` = 38) > > > AND NOT (sg.`genre_id` = 22 AND sg.`genre_id` = 61) > > > > > > So I'm sort of out of ideas at this point and hoping someone has a > > > way > to > > do > > > this. > > > > > > Also, just for S&G this is how we are currently implementing it, but > > > we > > feel > > > the REGEXP is killing our queries and while "clever" is a bit hacky > > > and nullifies any indexes we have on the genres column as it > > > requires a file_sort table scan to compare substrings basically... > > > > > > SELECT * FROM scene_all_genres WHERE scene_id = 17; > > > > > > scene_id genres > > > -------- ------------------------ > > > 17 1|3|10|19|38|53|58|59| > > > > > > SELECT * FROM scene_all_genres WHERE scene_id = 11; > > > > > > scene_id genres > > > -------- ------------------- > > > 11 1|10|19|31|32|59| > > > > > > SELECT DISTINCT > > > s.`scene_id` AS `id`, > > > sg.`genres` > > > FROM > > > `scene_all_genres` AS sg, > > > `dvds` AS d, > > > `scenes_list` AS s > > > WHERE dvd_id` = d.`dvd_id` > > > AND sg.`scene_id` = s.`scene_id` > > > AND sg.`genres` REGEXP '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]' > > > AND sg.`genres` NOT REGEXP > > '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))' > > > > > > ; > > > > > > http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp > > > > > > [[:<:]], [[:>:]] > > > > > > These markers stand for word boundaries. They match the beginning > > > and > end > > of > > > words, respectively. A word is a sequence of word characters that is > > > not preceded by or followed by word characters. A word character is > > > an alphanumeric character in the alnum class or an underscore (_). > > > > > > > To me it looks like you want to know how to match N for N when looking > > for subset of properties. It's a multi-step process using plain SQL > > but it's fast: > > > > 1) Build a list of scenes containing the genres you want to see > > > > CREATE TEMPORARY TABLE tmpList(key(scene_id, genre_id)) SELECT > > scene_id, genre_id FROM scenes_genres WHERE genre_id` IN (10,38) <-- > > in this case you have 2 terms to mach > > > > 2) From that list, determine which of those scenes also contain > > unwanted genres and remove them. > > > > DELETE tmpList > > FROM tmpList t > > INNER JOIN scenes_genres sg > > on sg.scene_id = t.scene_id > > and sg.genre_id IN (22,61) > > > > # at this point, tmpList contains all scenes that have any of the > > desired genres but none of the unwanted ones. > > > > 3) Check to see if any scene has all N matches. > > > > SELECT scene_id, count(genre_id) as matches FROM tmpList GROUP BY > > scene_id HAVING matches = 2 > > /* ^--- this is the number of terms you are trying to match. */ > > > > > > Of course you can modify the last query to eliminate the HAVING clause > > and pick the top 5 matching scenes (even if they are partial matches) > > like this > > > > SELECT scene_id, count(genre_id) as matches > > FROM tmpList > > GROUP BY scene_id > > ORDER BY matches DESC > > LIMIT 5 > > > > Let us know if this is not what you wanted to do. > > > > > > > > Note to the rest of the list: > > > > Yes, we could have done most of this with self-joins or subqueries, an > > EXISTS, and a NOT EXISTS inside a single statement but I think the > > execution time would have been miserable. I also think that that > > approach also would not have allowed us to evaluate a partial match > > (like 5 out of 7 target genres), only complete matches would have been > > returned. > > > > I am still very interested in seeing alternative solutions :) > > > > -- > > Shawn Green > > MySQL Principal Technical Support Engineer > > Oracle USA, Inc. - Hardware and Software, Engineered to Work Together. > > Office: Blountville, TN > > Shawn, thank you for taking the time to reply. I wasn't expecting the > solution to be so much "work" with multiple statements like that. I was > thinking it could be done in one (or two, as in split out a portion of it > in > PHP and re-insert it to the original SQL to avoid a JOIN or something). > Part > of the issue is that we use PHP to generate the $sql string by appending > bits and pieces depending on the search criteria thereby keeping the > 'path' > through the SQL statement simple and relatively linear. To implement this > would require significant re-writing and/or special cases where we could > introduce errors or omissions in the future. The frustrating part is that > the REGEXP query we use now only takes about 2 seconds on my DEV VM (same > database as PROD), however when the RDBMS is loaded it then takes up to 30 > seconds so in theory it's not even that inefficient given the # rows. We > do > use memcached for the results, but since there are so many combinations a > user could choose, our hit ratio is not so great and therefore the cache > isn't doing us much good and this is why the RDBMS can get loaded up > easily. > > How can an "OR" be so simple using IN() but "AND" be so overly complex? > Seems that mysql should have another function for ALL() that works just > like > IN() to handle this kind of scenario. > > > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql