Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
On 9 Nov 2014, at 1:49pm, Tristan Van Berkomwrote: > This year in particular I've been faced with my first queries of modest > complexity, see for example this (temporary) paste: > >http://www.fpaste.org/148918/41545194/ > > This is the beginnings of a query which will try to fetch an available > employee for a given 'task', ordered eventually by their relevance, > first set 'is it their job ?' second set 'do they have the licenses and > formations to actually perform the task, even if it's not their job' ? > and so on and so forth. > > I've found that when trying to bend my brain around these problems of > modest complexity, they are only understandable if I visualize them in > terms of 'sets of rows' and how they join to each other. I haven't looked into your specific example in depth, but you should also be aware of another solution to this problem: Make a good but simple SQL query that gets you some way to your data, then parse the resulting rows using your programming language. This can lead to a far simpler program which is faster to write, easier to debug, and more flexible than what can be done entirely inside SQL. Just because one can write complicated iterating queries with sub-SELECTs in SQL doesn't means one should. There comes a point where it's batter to do the work inside whatever programming language you're using. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
On Sun, 2014-11-09 at 15:04 +0200, RSmith wrote: > On 2014/11/09 14:11, Tristan Van Berkom wrote: > >> A good SQL rule of thumb: if you can think of a way, so can the DBMS. "... > >> no opportunity to make a good guess" is not true. In > >> some sense, SQLite has had 10 years to make a good guess, and often does. > >> A nested select need not be materialized as a "table", > >> opaque or otherwise. It can be converted, or "flattened" under some > >> circumstances. SQLite's query planner isn't the most > >> sophisticated; no one suggests otherwise. It does not follow that every > >> JOIN will outperform every EXISTS or vice versa. > > Indeed this is a large misconception on my part, on the queries which > > I *have* profiled, it did turn out that JOINs were more effective than > > solutions which involve nesting select statements. > > This seems like the appropriate place to interject with a Statistics / > Sample-size comment, but I think the point is obvious. > > > Anyway, I appreciate the input and will try to accept that I should not > > be in control of how the query is run - I was under the impression that > > SQL engines can perform better when given more context about how the > > query should run (i.e. being more explicit with JOINs), but I do agree > > that, at least ideally, the planner should be able to make a better > > guess as to how to plot a query with a more relaxed/vague statement, > > than with a more explicit one. > > Woah... this is not at all what James tried to say and it might just be that > your choice of words is unfortunate and you did mean it > correctly, but just to be sure, allow me to elaborate somewhat: > > The query should in no way be "Vague" or "Relaxed", it should be precise, > succint, exact and fully correct making the intended > result impossible to misinterpret. [...] No you're right, it's a poor choice of words on my part, I don't mean vague in terms of what exact set of rows I want to extract, but rather vague in terms of 'what will happen when I issue this query'. Which I understand, and do agree, is something that ideally the query author should not have to trouble himself with. This year in particular I've been faced with my first queries of modest complexity, see for example this (temporary) paste: http://www.fpaste.org/148918/41545194/ This is the beginnings of a query which will try to fetch an available employee for a given 'task', ordered eventually by their relevance, first set 'is it their job ?' second set 'do they have the licenses and formations to actually perform the task, even if it's not their job' ? and so on and so forth. I've found that when trying to bend my brain around these problems of modest complexity, they are only understandable if I visualize them in terms of 'sets of rows' and how they join to each other. You'll note that I have nested the UNIONs in a select specifically because there are common JOINs I want to perform on the collective result, thus reducing the redundancy of the query (here I am again, trying to out-optimize the query planner). Anyway, I think you've all collectively succeeded in pointing out the error of my ways ;-) I will try to formulate more straight forward queries which only ask for the succinct set of data that I want to pull, rather than the queries I've been writing which are probably more constrictive to the planner for possibly no justifiable reason. Cheers, -Tristan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
On 2014/11/09 14:11, Tristan Van Berkom wrote: A good SQL rule of thumb: if you can think of a way, so can the DBMS. "... no opportunity to make a good guess" is not true. In some sense, SQLite has had 10 years to make a good guess, and often does. A nested select need not be materialized as a "table", opaque or otherwise. It can be converted, or "flattened" under some circumstances. SQLite's query planner isn't the most sophisticated; no one suggests otherwise. It does not follow that every JOIN will outperform every EXISTS or vice versa. Indeed this is a large misconception on my part, on the queries which I *have* profiled, it did turn out that JOINs were more effective than solutions which involve nesting select statements. This seems like the appropriate place to interject with a Statistics / Sample-size comment, but I think the point is obvious. Anyway, I appreciate the input and will try to accept that I should not be in control of how the query is run - I was under the impression that SQL engines can perform better when given more context about how the query should run (i.e. being more explicit with JOINs), but I do agree that, at least ideally, the planner should be able to make a better guess as to how to plot a query with a more relaxed/vague statement, than with a more explicit one. Woah... this is not at all what James tried to say and it might just be that your choice of words is unfortunate and you did mean it correctly, but just to be sure, allow me to elaborate somewhat: The query should in no way be "Vague" or "Relaxed", it should be precise, succint, exact and fully correct making the intended result impossible to misinterpret. What it should not be is convoluted with additional hot air that attempts to better explain to the QP how to do the query... that is where we trust the QP to do it's job as long as the question is not Vague and indeed very precise and concise. Sometimes the Query planner is faced with situations where an algebraically correct result can be achieved via different ways (picking/using an Index (if any) is a good case-in-point) and sometimes the best one is not obvious from the query or table size. In these cases, providing hints to it - such as running Analyze (Stat3 or 4 tables) and providing direct hints (such as 'Likely' and 'Likelihood') - is a good way for a programmer (or "Query Author" I should say) to /help/ the QP decide on what is best - but these should be used sparingly and only where the improvement is substantial and the optimisation not premature. Another consideration is that the QP gets updated almost every other month... you have no way of knowing whether your convoluted query (which may shave off a few milliseconds now) will still be the fastest next month when the QP is a bit smarter. The long-term best is to formulate the most precise and concise query possible and let the Engine do the work. Cheers, Ryan. (PS: I don't mean to second-guess James, and the only reason I feel any comfort saying what he meant is that we just had an interesting debate about this elsewhere and I have come to appreciate the view above somewhat more). ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
On Sat, 2014-11-08 at 14:27 -0500, James K. Lowden wrote: > On Sun, 09 Nov 2014 00:45:16 +0900 > Tristan Van Berkomwrote: > > > While I do understand SQL as a functional language, most functional > > programming I've done still has rather explicit syntax/rules, so I get > > particularly uncomfortable with writing vague statements, such as > > JOIN tableA, tableB WHERE ... without being exactly specific on the > > heart/nature of the join which is happening. > > Oh, I think you'll agree SQL has explicit syntax. I think what you > mean by "vague" is "nondeterministic with respect to the physical data > structures and algorithms". About that you're right; it's considered a > feature. :-) > > The idea is that the engine interpreting your SQL might not be > absolutely as fast as the most optimal solution for your hardware and > data at any one point in time. But it will be nearly so, much more > adaptable as hardware and data change, and thereby not trouble your > application with issues outside its problem domain. And of course > there are other advantages besides, as you know. > > Partly it's a matter of trust. You trust the OS to schedule your job > fairly, to supply virtual memory, to deliver your TCP packet. You > trust SQLite to deliver on its ACID contract, and to produce logically > correct result from queries. Why not trust it to find the shortest > path to your data? Yeah, I'm only starting to catch on though :) > > > Also what I've found in my limited experience is that nesting SELECT > > statements, at least in SQLite, completely throws off the planner, > > as in it has no opportunity to make a good guess and has to deal > > with the result set of a nested SELECT as an opaque table, > > A good SQL rule of thumb: if you can think of a way, so can the > DBMS. "... no opportunity to make a good guess" is not true. In some > sense, SQLite has had 10 years to make a good guess, and often does. > > A nested select need not be materialized as a "table", opaque or > otherwise. It can be converted, or "flattened" under some > circumstances. SQLite's query planner isn't the most sophisticated; no > one suggests otherwise. It does not follow that every JOIN will > outperform every EXISTS or vice versa. Indeed this is a large misconception on my part, on the queries which I *have* profiled, it did turn out that JOINs were more effective than solutions which involve nesting select statements. > > is generally not an indexed table (or perhaps it is, but I wouldnt > > know because those indexes don't seem to be declarative in any way). > > I don't know what you're mean by indexes that "don't seem to be > declarative". This was in context of the paragraph, i.e. it's conceivable that it may make sense to build a temporary index on the result of a nested select statement while accumulating results, and indeed, depending on the outer query EXPLAIN tells me it builds a temporary B-TREE on the inner query results... however there is no way to declare (afaics) what kind of temporary index is to be built on the result set of a nested inner select statement. That's all that I meant by that. > > So indeed, I am not comfortable with 'leaving it up to chance', > > and if there is a way to get higher specificity, I try to achieve > > that. > > If I may be so bold, beware of your assumptions. The fastest plan with > 100 rows may perform very poorly with 100 million rows. Be careful > what you optimize for. :-) > > My fans on this list (both of them) will be surprised, though, that I > partly agree with you. In terms of technology, there's very little > middle ground between fopen(3) and SQL's "I give query you give data" > contract. SQL experts, no matter the platform, spend a fair amount of > time coercing the system into using an efficient query plan. They > cannot say "apply criteria X to index Y and join to table T"; they must > work by indirection, creating indexes and using "query hints" (or, > sometimes, other query formations) until the planner does the "right > thing", however defined. I sometimes wish for a system that let me > express the query algebraically and the order of operation explicitly, > but afaik no such system exists except partially and grudgingly within > SQL. Indeed, and here's an example of a similar struggle I had, where I discovered the secret '+' trick, on this very list: https://www.mail-archive.com/sqlite-users%40sqlite.org/msg80881.html Anyway, I appreciate the input and will try to accept that I should not be in control of how the query is run - I was under the impression that SQL engines can perform better when given more context about how the query should run (i.e. being more explicit with JOINs), but I do agree that, at least ideally, the planner should be able to make a better guess as to how to plot a query with a more relaxed/vague statement, than with a more explicit one. Thanks for the discussion :)
Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
On Sun, 09 Nov 2014 00:45:16 +0900 Tristan Van Berkomwrote: > While I do understand SQL as a functional language, most functional > programming I've done still has rather explicit syntax/rules, so I get > particularly uncomfortable with writing vague statements, such as > JOIN tableA, tableB WHERE ... without being exactly specific on the > heart/nature of the join which is happening. Oh, I think you'll agree SQL has explicit syntax. I think what you mean by "vague" is "nondeterministic with respect to the physical data structures and algorithms". About that you're right; it's considered a feature. :-) The idea is that the engine interpreting your SQL might not be absolutely as fast as the most optimal solution for your hardware and data at any one point in time. But it will be nearly so, much more adaptable as hardware and data change, and thereby not trouble your application with issues outside its problem domain. And of course there are other advantages besides, as you know. Partly it's a matter of trust. You trust the OS to schedule your job fairly, to supply virtual memory, to deliver your TCP packet. You trust SQLite to deliver on its ACID contract, and to produce logically correct result from queries. Why not trust it to find the shortest path to your data? > Also what I've found in my limited experience is that nesting SELECT > statements, at least in SQLite, completely throws off the planner, > as in it has no opportunity to make a good guess and has to deal > with the result set of a nested SELECT as an opaque table, A good SQL rule of thumb: if you can think of a way, so can the DBMS. "... no opportunity to make a good guess" is not true. In some sense, SQLite has had 10 years to make a good guess, and often does. A nested select need not be materialized as a "table", opaque or otherwise. It can be converted, or "flattened" under some circumstances. SQLite's query planner isn't the most sophisticated; no one suggests otherwise. It does not follow that every JOIN will outperform every EXISTS or vice versa. > is generally not an indexed table (or perhaps it is, but I wouldnt > know because those indexes don't seem to be declarative in any way). I don't know what you're mean by indexes that "don't seem to be declarative". > So indeed, I am not comfortable with 'leaving it up to chance', > and if there is a way to get higher specificity, I try to achieve > that. If I may be so bold, beware of your assumptions. The fastest plan with 100 rows may perform very poorly with 100 million rows. Be careful what you optimize for. :-) My fans on this list (both of them) will be surprised, though, that I partly agree with you. In terms of technology, there's very little middle ground between fopen(3) and SQL's "I give query you give data" contract. SQL experts, no matter the platform, spend a fair amount of time coercing the system into using an efficient query plan. They cannot say "apply criteria X to index Y and join to table T"; they must work by indirection, creating indexes and using "query hints" (or, sometimes, other query formations) until the planner does the "right thing", however defined. I sometimes wish for a system that let me express the query algebraically and the order of operation explicitly, but afaik no such system exists except partially and grudgingly within SQL. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
On Sat, 2014-11-08 at 09:46 -0700, Keith Medcalf wrote: > On Saturday, 8 November, 2014 06:56, Tristan Van Berkom >said: > > >On Sat, 2014-11-08 at 06:23 -0700, Keith Medcalf wrote: > >> How about the direct approach: > >> > >> SELECT uid > >> FROM resource > >> WHERE uid NOT IN (SELECT resource_uid > >> FROM event_participant, event > >> WHERE event_participant.event_uid = event.uid > >> AND event.shift_uid = :shift_uid > >> AND event.date = :event_date) > >> > >> or perhaps > >> > >> SELECT uid > >> FROM resource > >> WHERE NOT EXISTS (SELECT 1 > >> FROM event_participant, event > >> WHERE event_participant.event_uid = event.uid > >> AND resource_uid = resource.uid > >> AND event.shift_uid = :shift_uid > >> AND event.date = :event_date) > >> > >> Is not the "right way to do it" the one that obtains the result > >required, not the one that uses a "checkbox" implementation? > > > >I've presented 2 queries which both obtain the required result, and I > >would have to say both of them have varying levels of correctness. > > > >If obtaining the correct result was all that was required, then I could > >simply issue multiple queries and balloon my memory with result UIDs > >in between each of them, all for the sake of better readability (but > >then, I'm not sure why I would need a nice tool like SQLite to do it > >in the first place). > > > >So I would have to say, the "right way to do it" is the most efficient > >way, the one which provides SQLite with the best indications of how > >to plot an efficient query plan. > > > >The heuristics of the table is generally that resources are rather > >finite, while events grow over time (possibly 10 events each with > >10 participants for 3 shifts per day, meaning after a year that > >table will be large, and we'll still want to find the available > >resources hopefully in under 30ms, ideally 10ms). > > > >So yeah, speed at query time (not insert/update/delete time), is very > >important. > > While either of the above are direct translations, if you wish to minimize > materialization of intermediate data then the second would be preferable > since it uses only inner joins. However, the first may be far more efficient > depending on the shape of your data and the number of rows, how much memory > is available for SQLite to use (Bytes, KBytes, MBytes), and the speed of your > I/O devices. > > Given that selecting all the resources based on your stated heuristics of > growth, having proper indexes, and the constraints given in the query, I > would say that the first NOT IN will be more efficient since it is only > building a list of a (couple) hundred or so exclusions, and NOT IN is very > efficient. The second form will use a few thousand bytes less memory but > perform significantly more I/O. > > Theoretically, of course, the execution plan of a query which obtains a > specified result should not depend on the phrasing of the query. They should > all result in the same execution plan and the same result. However, this is > not true in practice because of limitations in optimization. Some optimizers > need a bit of help in figuring out how to generate an optimum plan, and some > others will generate overly complicated and inefficient plans simply because > they prefer to use newly added or nifty features only available in that > particular database engine rather than a simpler, faster, more direct > solution. > > I don't know why an outer join would be preferable in any case -- you are > going to use extra CPU and I/O to perform a join on which a significant > proportion of the results will be discarded -- or depend on the optimizer to > notice this and "optimize" away the outer join thus ending up with a "more > direct" phrasing which could have been written in the first place (and is far > more understandable for future maintainers). > > Many optimizers can only optimize within specific constraints and specifying > a cute complicated query will result in the most efficient cute and > complicated solution it can muster (ie, a local optimum, not a global > optimum). > > Your statement 1 implements NOT IN indirectly by doing a LEFT JOIN then > discarding the intersection set. This requires an extra join and discard > over a more direct NOT IN phrasing. It is also more difficult to understand. > It could be optimized into the NOT IN or NOT EXISTS form by the optimizer if > you are lucky (presently it will not). > > Your statement 2 implement the correlated subquery NOT EXISTS by an indirect > expression, again using a LEFT JOIN then discarding the intersection results. > Same comments regarding extra operations and optimizations that cannot > (presently) be done apply. > > If you prefer the ON
Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
There's nothing vague about select statements, they're logical formulas involving the data in your database and as exact as any other programming language, albeit in a very different domain. Relational databases are based on first order predicate logic and have operations that are are rigorously defined. You're obviously thinking about the limitations of the database's optimizer and the opaque way it does its work, but just as a real time or embedded programmer may have qualms about (the uncertainty of) the code produced by a compiler and prefer hand rolled assembler, you want more control and certainty over what the database does. The problem in both cases is that the programmer assumes they can do better than the machine, which is highly questionable given the current state of the art in both cases. You can give the optimiser tips about how to go about its work and that can help improve efficiency, but you are better off trusting the optimiser rather than trying to fight it. If there is an actual problem with the efficiency of a well written select, it's almost always rooted in the design of the database schema, especially to much or too little de-normalization. On Sat, Nov 8, 2014 at 7:45 AM, Tristan Van Berkomwrote: > On Sat, 2014-11-08 at 10:23 -0500, James K. Lowden wrote: > > On Sat, 08 Nov 2014 22:55:46 +0900 > > Tristan Van Berkom wrote: > > > > > So I would have to say, the "right way to do it" is the most efficient > > > way, the one which provides SQLite with the best indications of how > > > to plot an efficient query plan. > > > > Keith is suggesting that the right way to do it is neither "any way that > > works" nor necessarily "whatever is fastest" but "the clearest > > formulation of the query". Clarity has the salutary property of being > > most likely to be correct (because understood by the human) and stands > > a better than fair chance of being executed efficiently (because it > > translates easily to a good query plan). > > > > Most of time -- not every time, but most of the time -- indexes > > and table design matter much more to efficient execution than query > > syntax. When a clearly expressed query is not executed efficiently in > > the presence of useful indexes, and especially when a slightly different > > one does, that's usually considered a defect of the query planner. > > I see what you're saying, as I mentioned in the initial email I do > consider myself to be a relative newbie, and the majority of work > I've been doing with SQL (until this year) has been working on > optimizing existing schemas/queries for embedded use. > > While I do understand SQL as a functional language, most functional > programming I've done still has rather explicit syntax/rules, so I get > particularly uncomfortable with writing vague statements, such as > JOIN tableA, tableB WHERE ... without being exactly specific on the > heart/nature of the join which is happening. > > Also what I've found in my limited experience is that nesting SELECT > statements, at least in SQLite, completely throws off the planner, > as in it has no opportunity to make a good guess and has to deal > with the result set of a nested SELECT as an opaque table, which > is generally not an indexed table (or perhaps it is, but I wouldnt > know because those indexes don't seem to be declarative in any way). > > So indeed, I am not comfortable with 'leaving it up to chance', > and if there is a way to get higher specificity, I try to achieve that. > > In any case, I do have queries which work well at this point, but > posted this question to the list in the hope I could find the right > specificity for the given query - it's not a very big deal, I'll > probably just stick with what works and try to fix it in the places > where profiling reveals that I'm doing something wrong. > > Cheers, > -Tristan > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
On Saturday, 8 November, 2014 06:56, Tristan Van Berkomsaid: >On Sat, 2014-11-08 at 06:23 -0700, Keith Medcalf wrote: >> How about the direct approach: >> >> SELECT uid >> FROM resource >> WHERE uid NOT IN (SELECT resource_uid >> FROM event_participant, event >> WHERE event_participant.event_uid = event.uid >> AND event.shift_uid = :shift_uid >> AND event.date = :event_date) >> >> or perhaps >> >> SELECT uid >> FROM resource >> WHERE NOT EXISTS (SELECT 1 >> FROM event_participant, event >> WHERE event_participant.event_uid = event.uid >> AND resource_uid = resource.uid >> AND event.shift_uid = :shift_uid >> AND event.date = :event_date) >> >> Is not the "right way to do it" the one that obtains the result >required, not the one that uses a "checkbox" implementation? > >I've presented 2 queries which both obtain the required result, and I >would have to say both of them have varying levels of correctness. > >If obtaining the correct result was all that was required, then I could >simply issue multiple queries and balloon my memory with result UIDs >in between each of them, all for the sake of better readability (but >then, I'm not sure why I would need a nice tool like SQLite to do it >in the first place). > >So I would have to say, the "right way to do it" is the most efficient >way, the one which provides SQLite with the best indications of how >to plot an efficient query plan. > >The heuristics of the table is generally that resources are rather >finite, while events grow over time (possibly 10 events each with >10 participants for 3 shifts per day, meaning after a year that >table will be large, and we'll still want to find the available >resources hopefully in under 30ms, ideally 10ms). > >So yeah, speed at query time (not insert/update/delete time), is very >important. While either of the above are direct translations, if you wish to minimize materialization of intermediate data then the second would be preferable since it uses only inner joins. However, the first may be far more efficient depending on the shape of your data and the number of rows, how much memory is available for SQLite to use (Bytes, KBytes, MBytes), and the speed of your I/O devices. Given that selecting all the resources based on your stated heuristics of growth, having proper indexes, and the constraints given in the query, I would say that the first NOT IN will be more efficient since it is only building a list of a (couple) hundred or so exclusions, and NOT IN is very efficient. The second form will use a few thousand bytes less memory but perform significantly more I/O. Theoretically, of course, the execution plan of a query which obtains a specified result should not depend on the phrasing of the query. They should all result in the same execution plan and the same result. However, this is not true in practice because of limitations in optimization. Some optimizers need a bit of help in figuring out how to generate an optimum plan, and some others will generate overly complicated and inefficient plans simply because they prefer to use newly added or nifty features only available in that particular database engine rather than a simpler, faster, more direct solution. I don't know why an outer join would be preferable in any case -- you are going to use extra CPU and I/O to perform a join on which a significant proportion of the results will be discarded -- or depend on the optimizer to notice this and "optimize" away the outer join thus ending up with a "more direct" phrasing which could have been written in the first place (and is far more understandable for future maintainers). Many optimizers can only optimize within specific constraints and specifying a cute complicated query will result in the most efficient cute and complicated solution it can muster (ie, a local optimum, not a global optimum). Your statement 1 implements NOT IN indirectly by doing a LEFT JOIN then discarding the intersection set. This requires an extra join and discard over a more direct NOT IN phrasing. It is also more difficult to understand. It could be optimized into the NOT IN or NOT EXISTS form by the optimizer if you are lucky (presently it will not). Your statement 2 implement the correlated subquery NOT EXISTS by an indirect expression, again using a LEFT JOIN then discarding the intersection results. Same comments regarding extra operations and optimizations that cannot (presently) be done apply. If you prefer the ON syntax, then you may of course use it (though, except in the case of a LEFT OUTER JOIN, it is merely syntactic sugar for a list of tables to join and where conditions. SELECT uid FROM resource WHERE uid NOT IN (SELECT resource_uid
Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
On Sat, 2014-11-08 at 10:23 -0500, James K. Lowden wrote: > On Sat, 08 Nov 2014 22:55:46 +0900 > Tristan Van Berkomwrote: > > > So I would have to say, the "right way to do it" is the most efficient > > way, the one which provides SQLite with the best indications of how > > to plot an efficient query plan. > > Keith is suggesting that the right way to do it is neither "any way that > works" nor necessarily "whatever is fastest" but "the clearest > formulation of the query". Clarity has the salutary property of being > most likely to be correct (because understood by the human) and stands > a better than fair chance of being executed efficiently (because it > translates easily to a good query plan). > > Most of time -- not every time, but most of the time -- indexes > and table design matter much more to efficient execution than query > syntax. When a clearly expressed query is not executed efficiently in > the presence of useful indexes, and especially when a slightly different > one does, that's usually considered a defect of the query planner. I see what you're saying, as I mentioned in the initial email I do consider myself to be a relative newbie, and the majority of work I've been doing with SQL (until this year) has been working on optimizing existing schemas/queries for embedded use. While I do understand SQL as a functional language, most functional programming I've done still has rather explicit syntax/rules, so I get particularly uncomfortable with writing vague statements, such as JOIN tableA, tableB WHERE ... without being exactly specific on the heart/nature of the join which is happening. Also what I've found in my limited experience is that nesting SELECT statements, at least in SQLite, completely throws off the planner, as in it has no opportunity to make a good guess and has to deal with the result set of a nested SELECT as an opaque table, which is generally not an indexed table (or perhaps it is, but I wouldnt know because those indexes don't seem to be declarative in any way). So indeed, I am not comfortable with 'leaving it up to chance', and if there is a way to get higher specificity, I try to achieve that. In any case, I do have queries which work well at this point, but posted this question to the list in the hope I could find the right specificity for the given query - it's not a very big deal, I'll probably just stick with what works and try to fix it in the places where profiling reveals that I'm doing something wrong. Cheers, -Tristan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
On Sat, 08 Nov 2014 22:55:46 +0900 Tristan Van Berkomwrote: > So I would have to say, the "right way to do it" is the most efficient > way, the one which provides SQLite with the best indications of how > to plot an efficient query plan. Keith is suggesting that the right way to do it is neither "any way that works" nor necessarily "whatever is fastest" but "the clearest formulation of the query". Clarity has the salutary property of being most likely to be correct (because understood by the human) and stands a better than fair chance of being executed efficiently (because it translates easily to a good query plan). Most of time -- not every time, but most of the time -- indexes and table design matter much more to efficient execution than query syntax. When a clearly expressed query is not executed efficiently in the presence of useful indexes, and especially when a slightly different one does, that's usually considered a defect of the query planner. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
On Sat, 2014-11-08 at 06:23 -0700, Keith Medcalf wrote: > How about the direct approach: > > SELECT uid > FROM resource > WHERE uid NOT IN (SELECT resource_uid > FROM event_participant, event > WHERE event_participant.event_uid = event.uid > AND event.shift_uid = :shift_uid > AND event.date = :event_date) > > or perhaps > > SELECT uid > FROM resource > WHERE NOT EXISTS (SELECT 1 > FROM event_participant, event > WHERE event_participant.event_uid = event.uid > AND resource_uid = resource.uid > AND event.shift_uid = :shift_uid > AND event.date = :event_date) > > Is not the "right way to do it" the one that obtains the result required, not > the one that uses a "checkbox" implementation? I've presented 2 queries which both obtain the required result, and I would have to say both of them have varying levels of correctness. If obtaining the correct result was all that was required, then I could simply issue multiple queries and balloon my memory with result UIDs in between each of them, all for the sake of better readability (but then, I'm not sure why I would need a nice tool like SQLite to do it in the first place). So I would have to say, the "right way to do it" is the most efficient way, the one which provides SQLite with the best indications of how to plot an efficient query plan. The heuristics of the table is generally that resources are rather finite, while events grow over time (possibly 10 events each with 10 participants for 3 shifts per day, meaning after a year that table will be large, and we'll still want to find the available resources hopefully in under 30ms, ideally 10ms). So yeah, speed at query time (not insert/update/delete time), is very important. Best Regards, -Tristan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
How about the direct approach: SELECT uid FROM resource WHERE uid NOT IN (SELECT resource_uid FROM event_participant, event WHERE event_participant.event_uid = event.uid AND event.shift_uid = :shift_uid AND event.date = :event_date) or perhaps SELECT uid FROM resource WHERE NOT EXISTS (SELECT 1 FROM event_participant, event WHERE event_participant.event_uid = event.uid AND resource_uid = resource.uid AND event.shift_uid = :shift_uid AND event.date = :event_date) Is not the "right way to do it" the one that obtains the result required, not the one that uses a "checkbox" implementation? Using the "checkbox" approach, someone might be of the opinion that the "right way to do it" is entirely within a hand-coded StartIO channel program implemented for a 3390 disk controller, and that is unlikely to be implementable on a bitty-box. --- Theory is when you know everything but nothing works. Practice is when everything works but no one knows why. Sometimes theory and practice are combined: nothing works and no one knows why. >-Original Message- >From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- >boun...@sqlite.org] On Behalf Of Tristan Van Berkom >Sent: Saturday, 8 November, 2014 03:17 >To: sqlite-users@sqlite.org >Subject: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ? > >Hi all, > >Today I've stumbled on a situation where I think I really need to use a >RIGHT OUTER JOIN, and looking at all the examples on the internet I >could find so far, I'm not finding a way to simulate it properly using >LEFT OUTER JOINs. > >So I thought, before I commit to an inefficient alternative I would >check to see if someone on this list has an idea of how I can form an >efficient query for this. > >I'll start by highlighting some of the schema which is relevant to the >query: > >CREATE TABLE shift ( > uid INTEGER PRIMARY KEY AUTOINCREMENT, > > ... name and start time/duration of a shift ... >); > >CREATE TABLE resource ( > uid INTEGER PRIMARY KEY AUTOINCREMENT, > > ... details about the resource ... >); > >CREATE TABLE event ( > uidINTEGER PRIMARY KEY AUTOINCREMENT, > date INTEGER NOT NULL, /* unix timestamp */ > shift_uid INTEGER NOT NULL REFERENCES shift (uid), > > ... other details about the event ... >); > >CREATE TABLE IF NOT EXISTS event_participant ( > id INTEGER CHECK (id > 0), > event_uid INTEGER REFERENCES event (uid) ON DELETE CASCADE, > resource_uidINTEGER REFERENCES resource (uid), > > ... other details about this participant ... > > PRIMARY KEY (event_uid, id) >); > > >Now my goal here, is to pull out every resource UID which is _not_ >assigned to any event on a given date and shift UID. > >My first attempt that gets the job done looks like this: > >~ Statement 1 ~ > >SELECT resource.uid > >/* All resources... */ >FROM resource > >/* Subtract from this set all the booked resources */ >LEFT JOIN ( > >/* All resources which are booked on the given date / shift_uid */ >SELECT participant.resource_uid AS uid >FROM event >JOIN event_participant > AS participant > ON (participant.event_uid = event.uid) >WHERE (event.date = 1411185600 AND > event.shift_uid = 1 AND > participant.resource_uid IS NOT NULL) > >) AS booked_resource ON (booked_resource.uid = resource.uid) > >WHERE booked_resource.uid IS NULL; > >~ EXPLAIN SAYS ~ >1|0|0|SEARCH TABLE event USING INDEX event_shift_uid_idx (shift_uid=?) >(~2 rows) >1|1|1|SEARCH TABLE event_participant AS participant USING INDEX >event_participant_pk_idx (event_uid=?) (~5 rows) >0|0|0|SCAN TABLE resource (~100 rows) >0|1|1|SEARCH SUBQUERY 1 AS booked_resource USING AUTOMATIC COVERING >INDEX (uid=?) (~2 rows) > > >Now, this is not all that bad, except that I try to avoid nesting select >statements in cases like this as it prevents the query planner from >doing anything intelligent. > > >So I made another attempt which discards the nested statement: > >~ Statement 2 ~ > >SELECT resource.uid > >/* All Resources */ >FROM resource > >/* Get the cartesian product of