On Sat, 2014-11-08 at 09:46 -0700, Keith Medcalf wrote:
> On Saturday, 8 November, 2014 06:56, Tristan Van Berkom 
> <tris...@upstairslabs.com> said:
> 
> >On Sat, 2014-11-08 at 06:23 -0700, Keith Medcalf wrote:
> >> How about the direct approach:
> >>
> >> SELECT uid
> >>   FROM resource
> >>  WHERE uid NOT IN (SELECT resource_uid
> >>                      FROM event_participant, event
> >>                     WHERE event_participant.event_uid = event.uid
> >>                       AND event.shift_uid = :shift_uid
> >>                       AND event.date = :event_date)
> >>
> >> or perhaps
> >>
> >> SELECT uid
> >>   FROM resource
> >>  WHERE NOT EXISTS (SELECT 1
> >>                      FROM event_participant, event
> >>                     WHERE event_participant.event_uid = event.uid
> >>                       AND resource_uid = resource.uid
> >>                       AND event.shift_uid = :shift_uid
> >>                       AND event.date = :event_date)
> >>
> >> Is not the "right way to do it" the one that obtains the result
> >required, not the one that uses a "checkbox" implementation?
> >
> >I've presented 2 queries which both obtain the required result, and I
> >would have to say both of them have varying levels of correctness.
> >
> >If obtaining the correct result was all that was required, then I could
> >simply issue multiple queries and balloon my memory with result UIDs
> >in between each of them, all for the sake of better readability (but
> >then, I'm not sure why I would need a nice tool like SQLite to do it
> >in the first place).
> >
> >So I would have to say, the "right way to do it" is the most efficient
> >way, the one which provides SQLite with the best indications of how
> >to plot an efficient query plan.
> >
> >The heuristics of the table is generally that resources are rather
> >finite, while events grow over time (possibly 10 events each with
> >10 participants for 3 shifts per day, meaning after a year that
> >table will be large, and we'll still want to find the available
> >resources hopefully in under 30ms, ideally 10ms).
> >
> >So yeah, speed at query time (not insert/update/delete time), is very
> >important.
> 
> While either of the above are direct translations, if you wish to minimize 
> materialization of intermediate data then the second would be preferable 
> since it uses only inner joins.  However, the first may be far more efficient 
> depending on the shape of your data and the number of rows, how much memory 
> is available for SQLite to use (Bytes, KBytes, MBytes), and the speed of your 
> I/O devices.
> 
> Given that selecting all the resources based on your stated heuristics of 
> growth, having proper indexes, and the constraints given in the query, I 
> would say that the first NOT IN will be more efficient since it is only 
> building a list of a (couple) hundred or so exclusions, and NOT IN is very 
> efficient.  The second form will use a few thousand bytes less memory but 
> perform significantly more I/O.
> 
> Theoretically, of course, the execution plan of a query which obtains a 
> specified result should not depend on the phrasing of the query.  They should 
> all result in the same execution plan and the same result.  However, this is 
> not true in practice because of limitations in optimization.  Some optimizers 
> need a bit of help in figuring out how to generate an optimum plan, and some 
> others will generate overly complicated and inefficient plans simply because 
> they prefer to use newly added or nifty features only available in that 
> particular database engine rather than a simpler, faster, more direct 
> solution.  
> 
> I don't know why an outer join would be preferable in any case -- you are 
> going to use extra CPU and I/O to perform a join on which a significant 
> proportion of the results will be discarded -- or depend on the optimizer to 
> notice this and "optimize" away the outer join thus ending up with a "more 
> direct" phrasing which could have been written in the first place (and is far 
> more understandable for future maintainers).
> 
> Many optimizers can only optimize within specific constraints and specifying 
> a cute complicated query will result in the most efficient cute and 
> complicated solution it can muster (ie, a local optimum, not a global 
> optimum).
> 
> Your statement 1 implements NOT IN indirectly by doing a LEFT JOIN then 
> discarding the intersection set.  This requires an extra join and discard 
> over a more direct NOT IN phrasing.  It is also more difficult to understand. 
>  It could be optimized into the NOT IN or NOT EXISTS form by the optimizer if 
> you are lucky (presently it will not).
> 
> Your statement 2 implement the correlated subquery NOT EXISTS by an indirect 
> expression, again using a LEFT JOIN then discarding the intersection results. 
>  Same comments regarding extra operations and optimizations that cannot 
> (presently) be done apply.
> 
> If you prefer the ON syntax, then you may of course use it (though, except in 
> the case of a LEFT OUTER JOIN, it is merely syntactic sugar for a list of 
> tables to join and where conditions.
> 


Keith,

This was really an enlightening read for me.

Thanks for taking the time.

Best,
    -Tristan


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to