Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?

Tristan Van Berkom Sun, 09 Nov 2014 04:12:56 -0800

On Sat, 2014-11-08 at 14:27 -0500, James K. Lowden wrote:
> On Sun, 09 Nov 2014 00:45:16 +0900
> Tristan Van Berkom <[email protected]> wrote:
> 
> > While I do understand SQL as a functional language, most functional
> > programming I've done still has rather explicit syntax/rules, so I get
> > particularly uncomfortable with writing vague statements, such as
> > JOIN tableA, tableB WHERE ... without being exactly specific on the
> > heart/nature of the join which is happening.
> 
> Oh, I think you'll agree SQL has explicit syntax.  I think what you
> mean by "vague" is "nondeterministic with respect to the physical data
> structures and algorithms".  About that you're right; it's considered a
> feature.  :-)  
> 
> The idea is that the engine interpreting your SQL might not be
> absolutely as fast as the most optimal solution for your hardware and
> data at any one point in time.  But it will be nearly so, much more
> adaptable as hardware and data change, and thereby not trouble your
> application with issues outside its problem domain.  And of course
> there are other advantages besides, as you know.  
> 
> Partly it's a matter of trust.  You trust the OS to schedule your job
> fairly, to supply virtual memory, to deliver your TCP packet.  You
> trust SQLite to deliver on its ACID contract, and to produce logically
> correct result from queries.  Why not trust it to find the shortest
> path to your data?


Yeah, I'm only starting to catch on though :)

> 
> > Also what I've found in my limited experience is that nesting SELECT
> > statements, at least in SQLite, completely throws off the planner,
> > as in it has no opportunity to make a good guess and has to deal
> > with the result set of a nested SELECT as an opaque table, 
> 
> A good SQL rule of thumb: if you can think of a way, so can the
> DBMS.  "... no opportunity to make a good guess" is not true.  In some
> sense, SQLite has had 10 years to make a good guess, and often does.  
> 
> A nested select need not be materialized as a "table", opaque or
> otherwise.  It can be converted, or "flattened" under some
> circumstances.  SQLite's query planner isn't the most sophisticated; no
> one suggests otherwise.  It does not follow that every JOIN will
> outperform every EXISTS or vice versa.  

Indeed this is a large misconception on my part, on the queries which
I *have* profiled, it did turn out that JOINs were more effective than
solutions which involve nesting select statements.

> > is generally not an indexed table (or perhaps it is, but I wouldnt
> > know because those indexes don't seem to be declarative in any way).
> 
> I don't know what you're mean by indexes that "don't seem to be
> declarative".  

This was in context of the paragraph, i.e. it's conceivable that it
may make sense to build a temporary index on the result of a nested
select statement while accumulating results, and indeed, depending
on the outer query EXPLAIN tells me it builds a temporary B-TREE on
the inner query results... however there is no way to declare (afaics)
what kind of temporary index is to be built on the result set of
a nested inner select statement.

That's all that I meant by that.

> > So indeed, I am not comfortable with 'leaving it up to chance',
> > and if there is a way to get higher specificity, I try to achieve
> > that.
> 
> If I may be so bold, beware of your assumptions.  The fastest plan with
> 100 rows may perform very poorly with 100 million rows.  Be careful
> what you optimize for.  :-)  
> 
> My fans on this list (both of them) will be surprised, though, that I
> partly agree with you.  In terms of technology, there's very little
> middle ground between fopen(3) and SQL's "I give query you give data"
> contract.  SQL experts, no matter the platform, spend a fair amount of
> time coercing the system into using an efficient query plan.  They
> cannot say "apply criteria X to index Y and join to table T"; they must
> work by indirection, creating indexes and using "query hints" (or,
> sometimes, other query formations) until the planner does the "right
> thing", however defined.  I sometimes wish for a system that let me
> express the query algebraically and the order of operation explicitly,
> but afaik no such system exists except partially and grudgingly within
> SQL.  

Indeed, and here's an example of a similar struggle I had, where I 
discovered the secret '+' trick, on this very list:

  https://www.mail-archive.com/sqlite-users%40sqlite.org/msg80881.html

Anyway, I appreciate the input and will try to accept that I should not
be in control of how the query is run - I was under the impression that
SQL engines can perform better when given more context about how the
query should run (i.e. being more explicit with JOINs), but I do agree
that, at least ideally, the planner should be able to make a better
guess as to how to plot a query with a more relaxed/vague statement,
than with a more explicit one.

Thanks for the discussion :)

Cheers,
    -Tristan


_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?

Reply via email to