On Sat, 2014-11-08 at 14:27 -0500, James K. Lowden wrote: > On Sun, 09 Nov 2014 00:45:16 +0900 > Tristan Van Berkom <[email protected]> wrote: > > > While I do understand SQL as a functional language, most functional > > programming I've done still has rather explicit syntax/rules, so I get > > particularly uncomfortable with writing vague statements, such as > > JOIN tableA, tableB WHERE ... without being exactly specific on the > > heart/nature of the join which is happening. > > Oh, I think you'll agree SQL has explicit syntax. I think what you > mean by "vague" is "nondeterministic with respect to the physical data > structures and algorithms". About that you're right; it's considered a > feature. :-) > > The idea is that the engine interpreting your SQL might not be > absolutely as fast as the most optimal solution for your hardware and > data at any one point in time. But it will be nearly so, much more > adaptable as hardware and data change, and thereby not trouble your > application with issues outside its problem domain. And of course > there are other advantages besides, as you know. > > Partly it's a matter of trust. You trust the OS to schedule your job > fairly, to supply virtual memory, to deliver your TCP packet. You > trust SQLite to deliver on its ACID contract, and to produce logically > correct result from queries. Why not trust it to find the shortest > path to your data?
Yeah, I'm only starting to catch on though :) > > > Also what I've found in my limited experience is that nesting SELECT > > statements, at least in SQLite, completely throws off the planner, > > as in it has no opportunity to make a good guess and has to deal > > with the result set of a nested SELECT as an opaque table, > > A good SQL rule of thumb: if you can think of a way, so can the > DBMS. "... no opportunity to make a good guess" is not true. In some > sense, SQLite has had 10 years to make a good guess, and often does. > > A nested select need not be materialized as a "table", opaque or > otherwise. It can be converted, or "flattened" under some > circumstances. SQLite's query planner isn't the most sophisticated; no > one suggests otherwise. It does not follow that every JOIN will > outperform every EXISTS or vice versa. Indeed this is a large misconception on my part, on the queries which I *have* profiled, it did turn out that JOINs were more effective than solutions which involve nesting select statements. > > is generally not an indexed table (or perhaps it is, but I wouldnt > > know because those indexes don't seem to be declarative in any way). > > I don't know what you're mean by indexes that "don't seem to be > declarative". This was in context of the paragraph, i.e. it's conceivable that it may make sense to build a temporary index on the result of a nested select statement while accumulating results, and indeed, depending on the outer query EXPLAIN tells me it builds a temporary B-TREE on the inner query results... however there is no way to declare (afaics) what kind of temporary index is to be built on the result set of a nested inner select statement. That's all that I meant by that. > > So indeed, I am not comfortable with 'leaving it up to chance', > > and if there is a way to get higher specificity, I try to achieve > > that. > > If I may be so bold, beware of your assumptions. The fastest plan with > 100 rows may perform very poorly with 100 million rows. Be careful > what you optimize for. :-) > > My fans on this list (both of them) will be surprised, though, that I > partly agree with you. In terms of technology, there's very little > middle ground between fopen(3) and SQL's "I give query you give data" > contract. SQL experts, no matter the platform, spend a fair amount of > time coercing the system into using an efficient query plan. They > cannot say "apply criteria X to index Y and join to table T"; they must > work by indirection, creating indexes and using "query hints" (or, > sometimes, other query formations) until the planner does the "right > thing", however defined. I sometimes wish for a system that let me > express the query algebraically and the order of operation explicitly, > but afaik no such system exists except partially and grudgingly within > SQL. Indeed, and here's an example of a similar struggle I had, where I discovered the secret '+' trick, on this very list: https://www.mail-archive.com/sqlite-users%40sqlite.org/msg80881.html Anyway, I appreciate the input and will try to accept that I should not be in control of how the query is run - I was under the impression that SQL engines can perform better when given more context about how the query should run (i.e. being more explicit with JOINs), but I do agree that, at least ideally, the planner should be able to make a better guess as to how to plot a query with a more relaxed/vague statement, than with a more explicit one. Thanks for the discussion :) Cheers, -Tristan _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

