On Sat, 2014-11-08 at 09:46 -0700, Keith Medcalf wrote: > On Saturday, 8 November, 2014 06:56, Tristan Van Berkom > <tris...@upstairslabs.com> said: > > >On Sat, 2014-11-08 at 06:23 -0700, Keith Medcalf wrote: > >> How about the direct approach: > >> > >> SELECT uid > >> FROM resource > >> WHERE uid NOT IN (SELECT resource_uid > >> FROM event_participant, event > >> WHERE event_participant.event_uid = event.uid > >> AND event.shift_uid = :shift_uid > >> AND event.date = :event_date) > >> > >> or perhaps > >> > >> SELECT uid > >> FROM resource > >> WHERE NOT EXISTS (SELECT 1 > >> FROM event_participant, event > >> WHERE event_participant.event_uid = event.uid > >> AND resource_uid = resource.uid > >> AND event.shift_uid = :shift_uid > >> AND event.date = :event_date) > >> > >> Is not the "right way to do it" the one that obtains the result > >required, not the one that uses a "checkbox" implementation? > > > >I've presented 2 queries which both obtain the required result, and I > >would have to say both of them have varying levels of correctness. > > > >If obtaining the correct result was all that was required, then I could > >simply issue multiple queries and balloon my memory with result UIDs > >in between each of them, all for the sake of better readability (but > >then, I'm not sure why I would need a nice tool like SQLite to do it > >in the first place). > > > >So I would have to say, the "right way to do it" is the most efficient > >way, the one which provides SQLite with the best indications of how > >to plot an efficient query plan. > > > >The heuristics of the table is generally that resources are rather > >finite, while events grow over time (possibly 10 events each with > >10 participants for 3 shifts per day, meaning after a year that > >table will be large, and we'll still want to find the available > >resources hopefully in under 30ms, ideally 10ms). > > > >So yeah, speed at query time (not insert/update/delete time), is very > >important. > > While either of the above are direct translations, if you wish to minimize > materialization of intermediate data then the second would be preferable > since it uses only inner joins. However, the first may be far more efficient > depending on the shape of your data and the number of rows, how much memory > is available for SQLite to use (Bytes, KBytes, MBytes), and the speed of your > I/O devices. > > Given that selecting all the resources based on your stated heuristics of > growth, having proper indexes, and the constraints given in the query, I > would say that the first NOT IN will be more efficient since it is only > building a list of a (couple) hundred or so exclusions, and NOT IN is very > efficient. The second form will use a few thousand bytes less memory but > perform significantly more I/O. > > Theoretically, of course, the execution plan of a query which obtains a > specified result should not depend on the phrasing of the query. They should > all result in the same execution plan and the same result. However, this is > not true in practice because of limitations in optimization. Some optimizers > need a bit of help in figuring out how to generate an optimum plan, and some > others will generate overly complicated and inefficient plans simply because > they prefer to use newly added or nifty features only available in that > particular database engine rather than a simpler, faster, more direct > solution. > > I don't know why an outer join would be preferable in any case -- you are > going to use extra CPU and I/O to perform a join on which a significant > proportion of the results will be discarded -- or depend on the optimizer to > notice this and "optimize" away the outer join thus ending up with a "more > direct" phrasing which could have been written in the first place (and is far > more understandable for future maintainers). > > Many optimizers can only optimize within specific constraints and specifying > a cute complicated query will result in the most efficient cute and > complicated solution it can muster (ie, a local optimum, not a global > optimum). > > Your statement 1 implements NOT IN indirectly by doing a LEFT JOIN then > discarding the intersection set. This requires an extra join and discard > over a more direct NOT IN phrasing. It is also more difficult to understand. > It could be optimized into the NOT IN or NOT EXISTS form by the optimizer if > you are lucky (presently it will not). > > Your statement 2 implement the correlated subquery NOT EXISTS by an indirect > expression, again using a LEFT JOIN then discarding the intersection results. > Same comments regarding extra operations and optimizations that cannot > (presently) be done apply. > > If you prefer the ON syntax, then you may of course use it (though, except in > the case of a LEFT OUTER JOIN, it is merely syntactic sugar for a list of > tables to join and where conditions. >
Keith, This was really an enlightening read for me. Thanks for taking the time. Best, -Tristan _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users